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Preface 



Formal methods provide system designers with the possibility to analyze system 
models and reason about them with mathematical precision and rigor. The use 
of formal methods is not restricted to the early development phases of a system, 
though. The different testing phases can also benefit from them to ease the pro- 
duction and application of effective and efficient tests. Many still regard formal 
methods and testing as an odd combination. Formal methods traditionally aim 
at verifying and proving correctness (a typical academic activity), while testing 
shows only the presence of errors (this is what practitioners do). Nonetheless, 
there is an increasing interest in the use of formal methods in software testing. It 
is expected that formal approaches are about to make a major impact on emerg- 
ing testing technologies and practices. Testing proves to be a good starting point 
for introducing formal methods in the software development process. 

This volume contains the papers presented at the 3rd Workshop on Formal 
Approaches to Testing of Software, FATES 2003, that was in affiliation with the 
IEEE/ ACM Conference on Automated Software Engineering (ASE 2003). This 
year, FATES received 43 submissions. Each submission was reviewed by at least 
three independent reviewers from the program committee with the help of ad- 
ditional reviewers. Based on their evaluations, 18 papers submitted by authors 
from 13 different countries were selected for presentation at the workshop. The 
papers present different approaches to using formal methods in software test- 
ing. One of the main themes is the generation of an efficient and effective set of 
test cases from a formal description. Different models and formalisms are used, 
such as finite state machines, input/output transition systems, timed automata, 
UML, and Abstract State Machines. An increasing number of test methodolo- 
gies (re)uses techniques from model checking. The prospects for using formal 
methods to improve software quality and reduce the cost of software testing 
are encouraging. But more efforts are needed, both in developing new theories 
and making existing methods applicable to the current practice of software de- 
velopment projects. Without doubt, coming FATES workshops will continue to 
contribute to the growing and evolving research activities in this field. 

We wish to express our gratitude to the authors for their valuable contribu- 
tions. We thank the program committee and the additional reviewers for their 
support in the paper selection process. Last but not least, we thank May Haydar 
who helped in organizing the proceedings and all persons from the Centre de 
Recherche Informatique de Montreal and the organizing committee of ASE 2003 
who were involved in arranging local matters. 
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October 2003 



Alexandre Petrenko, Andreas Ulrich 
FATES 2003 Co-chairs 
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Black-Box Testing of Grey-Box Behavior 



Benjamin Tyler and Neelam Soundarajan 

Computer and Information Science 
Ohio State University, Columbus, OH 43210, USA 
{tyler ,neelam}@cis . ohio-state . edu 

Abstract. Object-oriented frameworks are designed to provide function- 
ality common to a variety of applications. Developers use these frame- 
works in building their own specialized applications, often without having 
the source code of the original framework. Unfortunately, the interactions 
between the framework components and the new application code can 
lead to behaviors that could not be predicted even if valid black-box spec- 
ifications were provided for the framework components. What is needed 
are grey-box specifications that include information about sequences of 
method calls made by the original framework code. Our focus is on how 
to test frameworks against such specifications, which requires the ability 
to monitor such method calls made by the framework during testing. 
The problem is that without the source code of the framework, we can- 
not resort to code instrumentation to track these calls. We develop an 
approach that allows us to do this, and demonstrate it on a simple case 
study. 



1 Introduction 

An important feature of object-oriented (00) languages is the possibility of en- 
riching or extending the functionality of an 00 system [18] by providing, in de- 
rived classes, suitable definitions or re-definitions for some of the methods of some 
of the classes of the given system. Application frameworks [9,13,20] provide com- 
pelling examples of such enrichment. The framework includes a number of hooks, 
methods that are not (necessarily) defined in the framework but are invoked in 
specific, and often fairly involved, patterns by the polymorphic or template meth- 
ods [11] defined in the framework. An application developer can build a complete 
customized application by simply providing appropriate (re-)definitions for the 
hook methods, suited to the needs of the particular application. The calls to the 
hook methods from the template methods are dispatched to the methods defined 
by the application developer, so that the template methods also exhibit behavior 
tailored to the particular application. Since the patterns of hook method calls 
implemented in the template methods are often among the most intricate part of 
the overall application, a well designed framework can be of great help in build- 
ing applications, and maximizes the amount of reuse among the applications 
built on it. Our goal is to investigate approaches to perform specification-based 
testing of such frameworks. 

Testing such systems should clearly include testing these patterns of hook 
method calls. That is, we are interested in testing what is called the grey-box 
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behavior [2,5,10,22] of 00 systems, not just their black-box heha,vior . If we had 
access to the source code of the template methods, we could do this by instru- 
menting that code by inserting suitable instructions at appropriate points to 
record information about the hook method calls; for example, just prior to each 
such call, we could record the identity of the method being called, the values of 
the arguments, etc. But framework vendors, because of proprietary considera- 
tions, often will not provide the source code of their systems. Hence the challenge 
we face is to find a way to test the grey-box behavior of template methods with- 
out being able to make any changes to its code such as adding “monitoring 
code”, indeed without even having the file containing source code of the system. 

In this paper, we develop an approach that allows us to do this. The key idea 
underlying our approach is to exploit polymorphism to intercept hook method 
calls made by the template method being tested. When the hook method call is 
intercepted, the testing system will record the necessary information about the 
call, and then allow “normal” execution to resume. In a sense, the testing system 
that we build for testing a given framework is itself an application built on the 
framework being tested. This “application” can be generated automatically given 
information about the structure of the various classes that are part of the frame- 
work including the names and parameter types of the various methods and their 
specifications, and the compiled code of the framework. We have implemented 
a prototype test system generator that accomplishes this task. We present some 
details about our prototype later in the paper. 

1.1 Black-Box vs. Grey-Box Behavior 

How do we specify grey-box behavior? Standard specifications [14,18] in terms 
of pre- and post- conditions for each method of each class in the system only 
specify the black-box behavior of the method in question. Consider a template 
(or polymorphic, we will use the terms interchangeably) method t(). There is 
no information in the standard specification of t() about the hook method calls 
that t() makes during execution. We can add such information by introducing 
a trace variable [5,22], call it r, as an auxiliary variable [19] on which we record 
information about the hook method calls t() makes. When the method starts 
execution, t will be the empty sequence since at the start, t() has not made any 
such calls. As t() executes, information about each hook method call it makes 
will be recorded on r. We can then specify the grey-box behavior by including, 
in the post-condition of t(), not just information on the state of the object in 
question when t() terminates, but also about the value of r, i.e., about the hook 
method calls t() made during its execution; we will see examples of this later in 
the paper. Given such a grey-box specification, the key question we address is, 
how do we test t(), without accessing or modifying its code, to see if its actual 
grey-box behavior satisfies the specification? 

1.2 Comparison to Related Work 

A number of authors have addressed problems related to testing of polymorphic 
interactions [1,21,3,17] in 00 systems. In all of this work, the approach is to 
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try to test the behavior of a polymorphic method t() by using objects of all or 
many different derived classes to check whether t() behaves appropriately in each 
case, given the different hook method definitions to which the calls in t() will 
be dispatched, depending on the particular derived class that the given object is 
an instance of. Such an approach is not suitable for testing frameworks. We are 
interested in testing the framework independently of any application that may 
be built on it, i.e., independently of particular derived classes and particular 
definitions of the hook methods. The only suitable way to do this is to test it 
directly to see that the actual sequences of hook method calls it makes during the 
tests are consistent with its grey-box specification. The other key difference is 
our focus on testing polymorphic methods without having access to their source 
code. 

Another important question, of course, has to do with coverage. Typical 
coverage criteria that have been proposed [1,21,6] for testing polymorphic code 
have been concerned with measuring the extent to which, for example, every 
hook method call that appears in the polymorphic method is dispatched, in some 
test run, to each definition of the hook method (in the various derived classes). 
Clearly a criterion of this kind would be inappropriate for our purposes since our 
goal is to test the polymorphic methods of the framework independently of any 
derived classes. What we should aim for instead is to select test cases in such 
a way as to ensure that as many as possible of the sequences of hook method 
calls allowed by the grey-box specifications actually appear in the test runs. 
One problem here, as in any specification-based testing approach, is that the 
specification only specifies what behavior is allowed; there is no requirement that 
the system actually exhibit each behavior allowed by the specification. Hence, 
measuring our coverage by checking the extent to which the different sequences 
of hook method calls allowed by the specification show up in the test runs may 
be too conservative if the framework is not actually capable of exhibiting some of 
those sequences. Another approach, often used with specification-based testing, 
is based on partitioning of the input space, i.e., the set of values allowed by 
the pre-condition of the method. But partition-based testing suffers from some 
important problems [8,12] that raise concerns about its usefulness. We will return 
to this question briefly in the final section but we should note that our focus 
in this paper is developing an approach that, without needing us to access or 
modifying the source code of a template method, allows us to check whether the 
method meets its grey-box specification during a test run, rather than coverage 
criteria. 

1.3 Contributions 

The main contributions of the paper may be summarized as follows: 

— It identifies the importance of testing grey-box behavior of 00 systems. 

— It develops an approach to testing a system to see if it meets its grey-box 
specification without accessing or modifying the code of the system under 
test. 

— It illustrates the approach by applying it to a simple case study. 
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In Sect. 2 we consider how to specify grey-box behavior. In Sect. 3, we develop 
our approach to testing against such specifications without accessing the code. 
We use a simple case study as a running example in Sects. 2 and 3. In Sect. 4 
we present some details of our prototype system. In Sect. 5, we summarize our 
approach and consider future work. 



2 Grey-Box Specifications 

2.1 Limitations of Black-Box Specifications 

Consider the Eater class, a simple class whose instances represent entities that 
lead sedentary lives consisting of eating donuts and burgers, depicted in Fig. 1. 
The methods Eat_Donuts() and Eat_Burgers() simply update the single mem- 
ber variable cals_Eaten which keeps track of how many calories have been con- 
sumed; the parameter n indicates how many donuts or burgers is to be consumed. 
Pig_Out() is a template method and invokes the hook methods Eat_Donuts() and 
Eat_Burgers(). 

class Eater { 

protected int cals_Eaten = 0; 
public void Eat_Donuts(int n) { 
cals_Eaten = cals_Eaten + 200 * n;} 
public void Eat_Burgers(int n) { 
cals_Eaten = cals_Eaten + 400 * n;} 
public final void Pig_Out() { 

Eat_Donuts(2); Eat_Burgers(2); } 

} 



Fig. 1. Base class Eater. 



Let us now consider the specification of Eater’s methods (Fig. 2). These can 
be specified as usual in terms of pre- and post-conditions describing the effect 
of each method on the member variables of the class. Here, we use the prime (') 
notation in the post-conditions to refer to the value of the variable in question 
at the time the method was invoked. Thus the specifications of Eat_Donuts() and 
Eat_Burgers() state that each of them increments the value of cals_Eaten appropri- 
ately. Given the behaviors of these methods, it is easy to see that the template 
method Pig_Out() will meet its specification that it increments cals_Eaten by 
1200 . 

Now suppose that the implementers of Eater provide only the compiled binary 
file and the black-box specification shown in Fig. 2, but not the source code in 
Fig. 1, to developers who wish to incorporate Eater in their own systems. What 
can such developers safely say about their own new classes that are extensions 
of Eater? Let us examine this question using the EaterJogger class, depicted in 
Fig. 3. EaterJogger, which is a derived class of Eater, keeps track not only of 
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pre.Eat_Donuts(n) 

post.Eat_Donuts(n) 


= n > 0 

= cals_Eaten = cals_Eaten' -|- 200 * n 


(2.1) 


pre.Eat_Burgers(n) 

post.Eat_Burgers(n) 


= n > 0 

= cals_Eaten = cals_Eaten' -|- 400 * n 


(2.2) 


pre.Pig_Out() 

post.Pig_Out() 


= true 

= cals_Eaten = cals_Eaten' -|- 1200 


(2.3) 




Fig. 2. Eater’s black-box specification. 





class EaterJogger extends Eater { 
protected int cals_Burned = 0; 
public void Jog() { 
cals_Burned = cals_Burned -f 500; } 

public void Eat_Donuts(int n) { 
cals_Eaten = cals_Eaten + 200 * n; 
cals_Burned = cals_Burned -|- 5 * n;} 
public void Eat_Burgers(int n) { 
cals_Eaten = cals_Eaten -|- 400 * n; 
cals_Burned = cals_Burned -f- 15 * n;} 

} 



Fig. 3. The derived class EaterJogger. 



cals_Eaten but also the new data member cals_Burned. The new method Jog() 
simply increments cals_Burned. More important, Eat_Donuts() and Eat_Burgers() 
have been redefined to update cals_Burned. 

What can we say about the behavior of Pig_Out() in this derived class? More 
precisely the question is, if ej is an object of type EaterJogger, what effect will the 
call ej.Pig_Out() have on ej.cals_Eaten and ej.cals_Burned? The calls in Pig_Out() 
to the hook methods will be dispatched to the methods redefined in EaterJogger. 
If we had access to the body of Pig_Out() (defined in the base class), we can see 
that it invokes Eat_Donuts(2) and then Eat_Burgers(2), and hence conclude, given 
the behaviors of these methods as redefined in EaterJogger, that in this class, 
Pig_Out() would increment cals_Eaten by 1200 and cals_Burned by 40. However, 
we have assumed that we only have access to Eater’s black-box specification 
shown in Fig. 2, but not the source code of Pig_Out(). 

Behavioral subtyping [15] provides part of the answer to this question. In 
essence, a derived class D is a behavioral subtype of its base class B if every 
method redefined in D satisfies its B-specification. If this requirement is met 
then we can be sure that in the derived class, a template method t() will meet 
its original specification ((2.3) in the case of Pig_Out()). This is because when 
reasoning about the behavior of t() in the base class, we would have appealed 
to the base class specifications of the hook methods when considering the calls 
in t() to these methods. If these methods, as redefined in D, satisfy those speci- 
fications, then clearly that reasoning still applies when the calls that t() makes 
to these methods are dispatched to the redefined versions in D. Our redefined 
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Eat_Donuts() and Eat_Burgers() do clearly satisfy their base class specifications 
(2.1) and (2.2), hence Pig_Out() in the derived class will also meet its base class 
specification (2.3). 

But this is only part of the answer. The redefined hook methods not only 
satisfy their base class specifications but exhibit richer behavior in terms of 
their effect on the new variable cals_Burned, which is easily specified (Fig. 4). 
Indeed, the whole point of redefining the hook methods was to achieve this 
richer behavior; after all, if all we cared about was the base class behavior, there 
would have been no need to redefine them at all. Not only is the hook methods’ 
behavior enriched through their redefinition, but the behavior of the template 
method in the derived class will also be enriched even though its code was not 
changed. How then, can we reason about this richer behavior of the template 
method? 



pre.Eat_Donuts(n) 

post.Eat_Donuts(n) 

pre.Eat_Burgers(n) 

post.Eat_Burgers(n) 



= n > 0 

= cals_Eaten = cals_Eaten' + 200 * n 
A cals_Burned = cals_Burned' + 5 * n 

= n > 0 

= cals_Eaten = cals_Eaten' + 400 * n 
A cals_Burned = cals_Burned' + 15 * n 



Fig. 4. Specifications for EaterJogger’s hook methods. 



(4.1) 



(4.2) 



If we examine the specifications for the redefined hook methods shown in 
Fig. 4, and (2.3), the black-box specification of Pig_Out(), can we arrive at the 
richer behavior of Pig_Out() in EaterJogger, in particular that it will increment 
cals_Burned by 40? The answer is clearly no, since there is nothing in (2.3) that 
tells us which, if any, hook methods Pig_Out() calls and how many times and 
with what argument values. Given (2.3), it is possible that it called Eat_Donuts() 
once with 6 as the argument and never called Eat_Burgers(); or Eat_Burgers() 
once with 3 as the argument, and Eat_Donuts() zero times; it is even possible 
that Pig_Out() didn’t call either hook method even once and instead directly in- 
cremented cals_Eaten by 1200. Even an implementation that called Eat_Donuts() 
ten times with 2 as the argument each time and then decremented cals_Eaten by 
2800 would work. All of these and more are possible, and depending on which 
of these Pig_Out() actually does, its effect on cals_Burned will be different. Note 
that for all of these cases, the original behavior (2.3) is still satisfied. That is 
ensured by behavioral subtyping. But if we are to arrive at the richer behavior 
of Pig_Out(), we need not just the black-box behavior of the template method 
in the base class as specified in (2.3), but also its grey-box behavior. 

2.2 Reasoning with Grey-Box Specifications 

Consider the grey-box specification (5.1) in Fig. 5. Here, t is the trace of this 
template method, t is the empty sequence, e, when Pig_Out() begins execution. 
Each time Pig_Out() invokes a hook method, we add an element to record this 
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hook method invocation. This element contains the name of the hook method 
called, the values of the member variables of the Eater class at the time of the call, 
their values at the time of the return from this call, the values of any additional 
arguments at the time of the call, their values at the time of the return, and the 
value of any additional result returned by the call. The grey-box post-condition 
gives us information about the value of r when the method finishes, hence about 
the hook method calls it made during its execution. Thus (5.1) states that |r|, 
the length of, i.e. the number of elements in, t is 2; that the hook method called 
in the first call, recorded in the first element r[l] of the trace, is Eat_Donuts; 
that the argument value passed in this call is 2; the hook method called in the 
second call is Eat_Burgers; and the argument passed in this call is 2. 

pre.Pig_Out() = r = e (5.1) 

post.Pig_Out() = cals_Eaten = cals_Eaten' + 1200 A |r| = 2 

A T[l].method = "Eat.Donuts” A T[l].arg = 2 
A r[2].met/iod = "Eat.Burgers" Ar[2].arg — 2 



Fig. 5. Grey-box specification for Pig_Out in Eater. 

It should be noted that (5.1) does not give us additional information about 
the value that cals_Eaten had at the time of either call or return. While this 
simplifies the specification, it also means that redefinitions of the hook methods 
that depend on the value of cals_Eaten cannot be reasoned about given (5.1). 
This is a tradeoff that we have to make when writing grey-box specifications; 
include full information, resulting in a fairly complex specification; or leave out 
some of the information, foreclosing the possibility of some enrichments (or at 
least of reasoning about such enrichments, which amounts to the same thing in 
the absence of access to the source code of the template method). 

Given this grey-box specification, what can we conclude about the behavior of 
Pig_Out() in the derived class? Note first that from (4.1) and (4.2), we can deduce 
that Eater_Jogger.Eat_Donuts() and Eater_Jogger.Eat_Burgers() satisfy (2.1) and 
(2.2), i.e., they satisfy the requirement of behavioral subtyping; hence Pig_Out() 
will satisfy (2.3) when invoked on EaterJogger objects. But we can also conclude 
given (2.1) and (2.2) and, as specified by (5.1), that Pig_Out() will make two hook 
method calls during its execution, first to Eat_Donuts() with argument value 2, 
and then to Eat_Burgers() with argument value 2, that in EaterJogger, Pig_Out() 
will increment cals_Burned by 40, as specified in Fig. 6. 

In [22] , we have proposed a set of rules that can be used in the usual fashion 
of axiomatic semantics to show: first, that the body of Pig_Out() defined in Fig. 
1 satisfies the grey-box specification (5.1); and second, by using the enrichment 
rule to “plug-in” the richer behavior specified in (4.1) and (4.2) for the redefined 
hook methods into (5.1), that in the derived class, the template method will 
satisfy the richer specification (6.1). Here our goal is to test the template method 
to see whether it satisfies its specification, so we now turn to that. 
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pre.Pig_Out() = T = £ (6.1) 

post.Pig_Out() = cals_Eaten = cals_Eaten' + 1200 

A cals_Burned = cals_Burned' + 40 A |r| =2 
A T[l].method = "Eat.Donuts" A T[l].arg = 2 
f\ t[2\. method = "Eat.Burgers" Ar[2].argi = 2 

Fig. 6. Grey-box specification for Pig_Out for class EaterJogger. This can be derived 
from (4.1), (4.2), and (5.1). 



2.3 The Challenge: Testing Grey-Box Behavior 
without Source Code 

If we wished to test Pig_0ut() against its black-box specification, the task can be 
carried out in a standard, straightforward fashion [18]. All we would need to do 
is create an object ee of type Eater, check that it satisfies the pre-condition given 
in (2.3) (which in this case is vacuous since it is simply true), apply Pig_Out() 
on ee, and check, when control returns, whether the post-condition specified in 
(2.3) is satisfied. But testing the grey-box behavior (5.1) is more complex. First, 
(5.1) refers to r, and r is not an actual variable of the class, but an auxiliary 
variable introduced for the purpose of specification. We can take care of this by 
introducing a trace variable, call it tau, as part of our testing setup and initialize 
it to the empty sequence immediately before invoking Pig_Out(). More seriously, 
tau needs to be updated whenever Pig_Out(), calls one of the hook methods; else, 
the value of tau will remain as e and will not satisfy the conditions specified in 
(5.1) even if in fact Pig_Out()’s grey-box behavior is in accordance with (5.1). 
The obvious way to update tau would be to examine the code (in Fig. 1) of 
Pig_Out(), identify all the calls that appear in this code body to hook methods, 
and insert appropriate instructions into the body of Pig_Out() at these points to 
update tau appropriately. Thus we would replace the call Eat_Donuts(2) by: 

Eat_Donuts(2); tau = tau ^ (Eat_Donuts, 2); 

where denotes appending the specified element to tau; calls to Eat_Burgers() 
would be handled similarly. Once we insert these instructions, we go through our 
testing procedure. When Pig_Out() finishes, tau would indeed have been updated 
appropriately, and we can check whether the post-condition in (5.1) is satisfied. 

As we saw earlier, each element of the trace should record not just the name 
of the hook method called and the argument value passed, but also the state of 
the object at the time of the call as well as when the call returns. Thus what 
we have is incomplete. This does not matter in this example since the grey-box 
specification (5.1) does not refer to any of this additional information. In general 
though, it is necessary to include all of this information in each element of tau; 
and it is straightforward (if a bit tedious) to do this by modifying the above 
instructions appropriately. But this approach does not meet our requirements. 
As we have noted before, we may not have access to the source code of the 
template method we want to test. Therefore, we certainly cannot make changes 
of this kind. The fundamental problem we have to address is, how do we ensure 
that the trace tau is appropriately updated to record the hook-method calls that 
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Pig_out() makes during its execution, without modifying its code, given that 
these calls are embedded in that code? In other words, how do we do black-box 
testing of Pig_Out()’s grey-box behavior? 

3 Black-Box Testing of Grey-Box Behavior 

The key problem we face in black-box testing of the grey-box behavior of 
Pig_Out() is that we cannot wait until it finishes execution to try to record 
information about its hook-method calls since, in general, by that point we no 
longer have that information. What we need to do instead is to intercept these 
calls as Pig_Out() makes them. But how can we do that if we are not allowed 
to modify Pig_Out() at the points of these calls? The answer is provided by the 
same mechanism that template methods are designed to exploit, i.e., polymor- 
phism. That is, rather than intercepting the calls by modifying the code of the 
template method, we will redefine the hook methods so that they update the 
trace appropriately whenever they are invoked. 

In Fig. 7 we define our test class, Test_Eater. Since in the post-conditions 
of methods we are allowed to use, by means of primed variables, the values 
that variables had when the method started execution, when testing against 
such specifications we need to save these initial values when a method begins 
executions. Thus in the test_Pig_Out() method of Test_Eater, we use old_cals_Eaten 
to save the starting value of cals_Eaten. 

class Test_Eater extends Eater { 

protected trace tau; 

public void Eat_Donuts(int n) { // redefined hook 
traceRec tauel; 

tauel = . . . info such as name of method called (Eat_Donuts), param. value (n), etc. 
super. Eat_Donuts(n); // call original hook 

tauel = . . .add info about current state, etc. 
tau.append(tauel); } 

// Eat_Burgers() is similarly redefined. 

public void test_Pig_Out() { 
if (true) { 

int old_cals_Eaten = this.cals_Eaten; // allowed, since Test_Eater extends Eater 

tau = e; 

this.Pig_Out(); 

assert (grey- box post-condition of Pig_Out() with appropriate substitutions); }; } 

} 



Fig. 7. Class Test_Eater. 



Test_Eater is a derived class of Eater, and we have redefined both the hook 
methods to update the trace, tau is the trace variable as before and tauel will 
record information about one hook method call which will be appended to tau 
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once the call has finished and returned. Let us see how Test_Eater.test_Pig_Out() 
works using the sequence call diagram [4] in Fig. 8. The six vertical lines, each 
labeled at the top with the name of a method (the three on the left being 
from Test_Eater, the three on the right from Eater), represent time-lines for the 
respective methods. To test that Eater. Pig_Out() satisfies its grey-box specifi- 
cation, we create an appropriate instance of the Test_Eater class and apply 
Pig_Out() to it. This call is represented by the solid arrow at the top-left of 
the figure. The method starts by checking the pre-condition, which is repre- 
sented by the point labeled with a diamond with a single question mark inside 
it. (The pre-condition is just true in this case.) Next it initializes tau to () and 
saves the initial state in the old variable; this point is labeled (It) in the figure. 
Next, it calls Pig_Out() (on the this object). Since Pig_Out() is not overridden 
in Test_Eater, this is a call to Eater. Pig_Out(), which is represented by the solid 
arrow from Test_Eater.test_Pig_Out() to Eater. Pig_Out(). (Note that Pig_Out() 
cannot be overridden in any case, since it is a final method.) 



Test_Eater 



Eater 




Fig. 8. Sequence Call Diagram for Test_Eater.Pig_Out(). 



Consider what happens when this method executes. First, it invokes 
Eat_Donuts() which we have overridden in Test_Eater. This call is dispatched 
to Test_Eater.Eat_Donuts() since the object that Pig_Out() is being applied 
to is of type Test_Eater. This dispatch is represented by the solid arrow 
from the time-line for Pig_Out() to that for Test_Eater.Eat_Donuts(). Now 
Test_Eater.Eat_Donuts() is simply going to delegate the call to Eater. Eat_Donuts() 
(represented by the arrow from Test_Eater.Eat_Donuts() to Eater. Eat_Donuts()). 
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However, before it delegates the call, it records appropriate information about 
this call on the trace-record variable tauel; this action is labeled by (2t) in 
the figure. Once Eater. Eat_Donuts() finishes (after performing its action con- 
sisting of updating Eater. cals_Eaten, represented by the point labeled (3)), 
control returns to Test_Eater.Eat_Donuts(), represented by the dotted arrow 
from Eater. Eat_Donuts() to Test_Eater.Eat_Donuts(). Test_Eater.Eat_Donuts() now 
records appropriate additional information on tauel and appends this record to 
tau (represented by the point labeled (4t)), and finishes. Thus, control returns 
to Eater. Pig_Out(), indicated by the dotted arrow from Test_Eater.Eat_Donuts() 
to Eater. Pig_Out(). That method next calls Eat_Burgers() and this call is again 
dispatched to Test_Eater.Eat_Burgers(), represented by the solid arrow from 
Eater. Pig_Out() to Test_Eater.Eat_Burgers(). 

The process of recording initial information, delegating the call to 
the corresponding method in Eater, updating cals_Eaten, returning from 
Eater. Eat_Burgers(), and appending the results to tau, is repeated; these are 
represented respectively by the point labeled (5t), the solid arrow from 
Test_Eater.Eat_Burgers() to Eater. Eat_Burgers(), the point (6), the dotted arrow 
from Eater. Eat_Burgers() to Test_Eater.Eat_Burgers()), and the point (7t). At this 
point Test_Eater.Eat_Burgers() finishes, so it returns to Eater. Pig_Out(), which is 
represented by the dotted arrow. That method is also done so it returns to 
Test_Eater.test_Pig_Out(). The final action, the one that we have been building 
up towards, is to check if the post-condition specified in the grey-box specifi- 
cation (5.1) (with tau substituting for t and old_cals_Eaten for cals_Eaten') is 
satisfied, labeled by the diamond with the double question mark. 

Thus by defining Test_Eater as a derived class of Eater, and by overriding 
the hook methods of Eater, we are able to exploit polymorphism to intercept 
the calls that the template method makes to the hook methods. This allows us 
to record information about these calls (and returns) without having to make 
any changes to the template method being tested, indeed without having any 
access to the source code of that method. This allows us to achieve our goal of 
black-box testing of the grey-box behavior of template methods. 

It should be noted that Test_Eater.Eat_Donuts() is not the test method for 
testing Eater. Eat_Donuts(). If we wished to test that method, we could include a 
test_Eat_Donuts() method in Test_Eater that would simply save the starting value 
of cals_Eaten, call the Eat_Donuts() method of the Eater class, and then assert 
that the post-condition of (2.1) is satisfied when control returns from that call. 

If there were more than one template method, we could introduce more than 
one trace variable; but since only one template test method will be executing 
at a time, and it starts by initializing tau to (), this is not necessary. Consider 
now the derived class Eater Jogger. How do we construct Test_Eater Jogger? It 
should be a derived class of Eater Jogger, not of Test_Eater, else the redefinitions 
of the hook methods in Eater Jogger would not be used by the test methods in 
Test_Eater Jogger. In general, test classes should be final. Test_C is only intended 
to test the methods of C. Another class D, even if D is a derived class of C, would 
have its own test class which would be a derived class of D. 
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4 Prototype Implementation 

We have implemented a prototype testing system, which is available at 
http://www.cis.ohio-state.edu/~tyler, that creates testing classes de- 
scribed in Sect. 3. The system inputs the grey-box specifications for template 
methods of the class C under test, and the black-box specifications for the non- 
template methods. The system then creates the source code for the test class, 
along with other adjunct classes needed for the testing process, in particular 
those used in constructing traces when testing the template methods of C. The 
methods to be treated as hooks must be explicitly identified so that they are 
redefined in the test class. An alternate approach would have been to treat all 
non-final methods as hooks; but our approach allows greater flexibility. Each 
redefined hook method that the tool produces also checks its pre- and post- 
condition before and after the dispatched call is made. This helps pinpoint prob- 
lems if a template method fails to satisfy its post-condition. 

Currently, our system does not generate test cases, but creates skeleton calls 
to the test methods, where the user is required to construct test values by hand. 
To do the actual testing, the generated classes are compiled, and the test class 
executed. An example of the system’s output is in Fig. 9. The last output shows 
a case where the grey-box specification was not met. The problem was that the 
compiled Eater class had a bug in the code of Pig_Out(): it passed 4 as the param- 
eter to Eat_Donuts() and 1 as the parameter to Eat_Burgers(); hence, although 
the black-box specification of Pig_Out() was satisfied, its grey-box specification 
was not. 

Test number 1: testing Eat_Donuts. 

Test number 1 succeeded! 

Test number 2: testing Eat_Burgers. 

Test number 2 succeeded! 

Test number 3: testing Pig_Out. 

Method Eat_Donuts called. 

Method Eat_Burgers called. 

Postcondition of Pig_Out not met! 

tau = ((” Eat_Donuts" , 4, 4), (" Eat_Burgers" , 1, 1)) 

Test number 3 failed! 

* * * RESULTS * * * 

Number of tests run: 3 

Number of tests successful: 2 

Fig. 9. Output from sample run. 



5 Discussion and Future Work 

Our work was motivated by two observations: First, given that perhaps the 
most important aspect of template methods is the hook method call patterns 
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they implement, testing such methods requires us to test against their grey-box 
specifications. Second, application developers often build their systems using 
COTS components, including frameworks. If this developer wishes to test such 
a component, she will have to do so without having access to the source code 
of the component; Weyuker [23] also notes the importance of testing COTS 
components without having access to their source code. The approach we have 
developed addresses both of these considerations. 

We conclude with some pointers for future work. We have ignored abstraction 
so far, instead working directly with the data members of the class under test. 
Cheon and Leavens [7] describe a testing system that can work with specifications 
that are given in terms of a conceptual model of the class under test. They do not 
consider grey-box specifications but we believe their approach can be extended 
to deal with grey-box behavior and we intend to explore that. 

A more serious question is that of generating appropriate test cases to achieve 
reasonable coverage. As we noted earlier, our prototype system requires the hu- 
man tester to provide the test cases. One interesting approach for generating 
test cases is used by TestEra [16] which is a system for specification-based test- 
ing Java programs. This system allows us to define, using a first-order relational 
language, complex properties that the objects must meet. Given a specification 
written in this notation, the system automatically generates instances that sat- 
isfies the pre-condition, so that we can then apply the method under test on the 
object in question. If the specification can be violated, TestEra generates a test 
case that shows that. The specifications that TestEra works with are black-box 
specifications; we plan to investigate whether a similar approach can be used to 
deal with grey-box specifications. 

Another important question relates to the scalability of the methodology 
when applied to systems whose traces contain complex objects. The current 
prototype attempts to generate suitable clone() (i.e., deep copy) methods when 
they are not provided by the user in order to save the object state. Sufficiently 
complex objects coupled with long traces may require a prohibitive amount of 
memory during testing. One possible way to lessen the problem is to save only 
values and object references that are mentioned in the specifications, instead 
of copying whole objects. (This is being done in the latest version of the tool.) 
Another possibility is to only store information about the particular changes 
made to objects during execution. These issues must be addressed before our 
testing systems are useable in practical settings. 



References 



1. R. Alexander and J. Offutt. Criteria for testing polymorphic relationships. In 
Int. Symp. on Softw. Reliability Eng., pages 15-23, 2000. 

2. M. Barnett, W. Grieskamp, C. Kerer, W. Schulte, C. Szyperski, N. Tilmann, and 
A. Watson. Serious specifications for composing components. In 6th ICSE Work- 
shop on Component-based Software Engineering, pages 1-6, 2003. 

3. R. Binder. Testing Object-Oriented Systems. Addison- Wesley, 1999. 




14 



Benjamin Tyler and Neelam Soundarajan 



4. G. Booch, J. Rumbaugh, and I. Jacobson. The Unified Modeling Language User 
Guide. Addison- Wesley, 1999. 

5. M. Buchi and W. Week. The greybox approach: when blackbox specifications 
hide too much. Technical Report TUGS TR No. 297, Turku Centre for Computer 
Science, 1999. available at http://www.tucs.abo.fi/. 

6. M.-H. Chen and H. M. Kao. Testing object-oriented programs — an integrated 
approach. In Int. Symp. on Softw. Reliability Eng., pages 73-83, 1999. 

7. Y. Cheon and G. Leavens. A simple and practical approach to unit testing: The 
JML and JUnit way. In Proc. of ECOOP 2002, pages 231-255. Springer- Verlag 
LNGS, 2002. 

8. J. Duran and S. Ntafos. An evaluation of random testing. IEEE Trans, on Software 
Eng., 10:438-444, 1984. 

9. M.E. Fayad and D.C. Schmidt. Special issue on object oriented application frame- 
works. Comm, of the ACM, 40, October 1997. 

10. G. Froehlich, H. Hoover, L. Liu, and P. Sorenson. Hooking into object-oriented 
application frameworks. In Proc. of 1997 Int. Conf. on Software Engineering, pages 
141-151. ACM, 1997. 

11. E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns: Elements of 
Reusable 00 Software. Addison- Wesley, 1995. 

12. D. Hamlet and R. Taylor. Partition testing does not inspire confidence. IEEE 
Trans, on Software Eng., 16(12):1402-1411, 1990. 

13. R. Johnson and B. Foote. Designing reusable classes. Journal of OOP, 1:26-49, 
1988. 

14. G. Jones. Systematic Software Development Using VDM. Prentice-Hall, 1990. 

15. B. Liskov and J. Wing. A behavioral notion of subtyping. ACM Trans, on Prog. 
Lang, and Systems, 16:1811-1841, 1994. 

16. D. Marinov and S. Khurshid. TestEra: A novel framework for automated testing 
of Java programs. In Proe. of 16th ASE. IEEE, 2001. 

17. R. McDaniel and J. D. McGregor. Testing the polymorphic interactions between 
classes. Technical Report 94-103, Dept, of Computer Sc., Clemson University, 1994. 

18. B. Meyer. Object-Oriented Software Construction. Prentice Hall, 1997. 

19. S. Owicki and D. Gries. An axiomatic proof technique for parallel programs. Acta 
Informatica, 6(l):319-340, 1976. 

20. W. Pree. Meta patterns: a means for capturing the essentials of reusable OO 
design. In Proceedings of the Eighth ECOOP, pages 150-162, 1994. 

21. A. Rountev, A. Milanova, and B. Ryder. Fragment class analysis for testing of 
polymorphism in Java software. In Int. Conf. on Softw. Eng., pages 210-220, 
2003. 

22. N. Soundarajan and S. Fridella. Framework-based applications: From incremental 
development to incremental reasoning. In W. Frakes, editor, Proe. of Sixth Int. 
Conf. on Software Reuse: Advances in Software Reusability, LNGS 1844, pages 
100-116. Springer, 2000. 

23. E. Weyuker. Testing component-based software: A cautionary tale. IEEE Software, 
15(5):54-59, 1998. 




On Checking Whether a Predicate Definitely Holds* 



Alper Sen and Vijay K. Garg 



Dept, of Electrical and Computer Engineering 
The University of Texas at Austin 
Austin, TX, 78712, USA 
{sen, garg}@ece .utexas . edu 
http; //WWW. ece .utexas . edu /~{ sen, gang} 



Abstract. Predicate detection is an important problem in testing and debugging 
distributed programs. Cooper and Marzullo introduced two modalities possibly 
and definitely as a solution to this problem. Given a predicate p, a computation 
satisfies possibly :p if p is true for some global state in the computation. A compu- 
tation satisfies definitely : p if all paths from the initial to the final global state go 
through some global state that satisfies p. In general, definitely modality is used to 
detect good conditions such as “a leader is eventually chosen by all processes”, or 
“a commit point is reached by every process”, whereas possibly modality is used 
to detect bad conditions such as violation of mutual exclusion. There are several ef- 
ficient algorithms for possibly modality in the literature [10,14,1,2,30]. However, 
this is not the case for definitely modality. Cooper and Marzullo’s definitely : p 
algorithm for arbitrary p has a worst-case space and time complexity exponential 
in the number of processes. This is due to the state explosion problem. In this paper 
we present efficient algorithms for detecting definitely : p. In particular, we give 
a simple algorithm that uses polynomial space. Then, we present an algorithm that 
can significantly reduce the global state-space. We determine necessary conditions 
and sufficient conditions under which detecting definitely : p may be efficiently 
solved. We apply our algorithms to example protocols, achieving a speedup of 
over 100, compared to partial order reduction based technique of SPIN [13]. 



1 Introduction 

A fundamental problem in distributed computing is predicate detection — deciding 
whether an execution trace of a distributed program satisfies a given predicate. This 
problem arises in many contexts such as testing and debugging of distributed programs. 
For example, when debugging a distributed mutual exclusion algorithm, it is useful to 
monitor the system to detect concurrent accesses to the shared resources. 

Cooper and Marzullo introduced two modalities for predicate detection, which 
are denoted by possibly and definitely. Given a predicate p, a computation satisfies 
possibly -.p if pis true for some global state in the computation. A computation satisfies 
definitely, p if all paths from the initial state to the final global state go through some 
global state that satisfies p. In general, possibly modality is used to detect bad condi- 
tions such as the system reaches a global state where the mutual exclusion predicate 
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is false. In contrast, definitely modality is in general used to detect good conditions 
such as “a leader is eventually chosen by all processes”, or “a commit point is reached 
by every process”. Cooper and Marzullo’s definitions of these modalities established 
an important conceptual framework for predicate detection, which has been the basis of 
considerable research. However, most of the research has focused on possibly modality 
[10,14,1,2,30]. 

Cooper and Marzullo present an algorithm for detecting definitely : p for arbitrary 
predicate p. The worst-case space and time complexity of the their algorithm is expo- 
nential in the number of processes. This is due to the state explosion problem — in a 
distributed system of n processes, the number of possible global states (state-space) can 
be of size 0{m"), where m is the maximum number of events on a process. 

This paper presents efficient algorithms for detecting definitely : p. We first present 
a simple algorithm for definitely : p that uses 0{nm) space in Section 4. Then, we 
present a polynomial-time state-space reduction algorithm that enables us to work on a 
distributed computation that is in general much smaller than the original computation. 
We prove that the original computation satisfies definitely : p if and only if the smaller 
computation satisfies it. It is, in general, coNP-complete to detect a predicate under 
definitely modality [28]. In Sections 5 and 6, we determine necessary conditions and 
sufficient conditions under which detecting definitely: p may be efficiently solved. 
In order to develop these conditions, we use lattice theoretic properties of distributed 
computations. We validate the effectiveness of our algorithms with experimental studies 
in Section 7. For this purpose, we implement our algorithms in the Partial Order Trace 
Analyzer (POTA) tool [27] and compare performance to partial order reduction based 
algorithms of model checker SPIN [13]. In one case, our algorithms are significantly 
faster and space efficient. We have measured over 100-fold gain. 

Our work constitutes part of the POTA tool [27,23] for testing distributed program 
execution traces using temporal logic predicates. Figure 1 displays an overview of POTA 
architecture. POTA consists of an instrumentation module, a translator module that 
translates execution traces into Promela [13] (SPIN model checker input language) and 
an analyzer module. The use of partial order model for execution traces and the use of an 
effective abstraction technique for temporal logic verification called computation slicing 
are significant aspects of POTA and constitutes the analyzer module. POTA implements 
polynomial-time temporal logic predicate detection algorithms. The temporal logic used 
in POTA is a subset of CTL [3] . With the results of this paper, we extend efficient predicate 
detection algorithms in POTA with definitely operator. Atomic propositions of the logic 
used in POTA are regular predicates, which widely occur in practice during verification. 
Some examples of regular predicates are conjunction of local predicates [8,15] such 
as “all processes are in red state”, certain channel predicates [8] such as “at most k 
messages are in transit from process Pi to Py”, and some relational predicates [8]. 



2 Related Work 

Our approach exploits the structure of the predicate itself — by imposing restrictions 
— to evaluate its value efficiently for a given computation. Polynomial-time algorithms 
for possibly : p have been developed when p belongs to conjunctive [10,14], observer- 
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independent [1], linear [2], and relational predicates [30]. Also in [22] there is an exten- 
sive survey on predicate detection techniques. 

Tarafdar and Garg [28] proved that it is, in general, NP-complete to detect a predi- 
cate under controllable modality. A computation satisfies controllable : p if every state 
on some path from the initial global state to the final global state satisfies p. Since the 
problem of detecting a predicate under definitely modality is the dual of the problem 
of detecting a predicate under controllable modality, it is, in general, coNP-complete 
to detect a predicate under definitely modality. Using Tarafdar and Garg’s [29] NP- 
completeness result for controlling a special case of 2-CNF predicates, called indepen- 
dent mutual exclusion predicates, we can easily deduce that detecting a special case of 
2-DNF predicates, which is the dual of independent mutual exclusion predicates, under 
definitely modality is coNP-complete in general. 

Fromentin and Raynal [7] presented a polynomial-time algorithm to solve the pred- 
icate detection problem for proper modality, which is a special case of definitely, A 
computation satisfies proper :p if all paths from the initial state to the final global state 
go through a unique global state that satisfies p. 

The definitely: p problem has efficient solutions when the predicate is 1-CNF or 
1-DNF [8]. Flowever, the complexity problem is open for definitely: p for regular p. 
In this paper, we present efficient conditions to solve the problem for both arbitrary and 
regular predicates. 

The idea of using temporal logic in program testing has been applied in several 
tools such as the commercial Temporal Rover tool (TR) [6], the MaC tool [17], and 
the JPaX tool [12]. TR allows the user to specify the temporal formula in programs. 
These temporal formula are translated into Java code before compilation. The MaC 
and JPaX tools consider a totally ordered view of an execution trace and therefore can 
potentially miss bugs that can be deduced from a partial order view of the trace. Hallal 
et al. in [11] uses a partial order view of an execution trace as in POTA. They translate 
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execution traces into SDL and use commercial SDL tools for testing translated traces. 
POTA incorporates several polynomial-time (polynomial in the number of processes) 
predicate detection algorithms whereas the complexity is exponential-time in [11]. 

3 Model 

We assume a loosely-coupled message-passing asynchronous system without any shared 
memory or a global clock. A distributed program consists of n sequential processes 
denoted by Pi, P2, ... ,Pn communicating via asynchronous messages. In this paper, 
we are concerned with a single computation {execution) of a distributed program. We 
assume that no messages are altered or spuriously introduced. We do not make any 
assumptions about FIFO nature of channels. 

The execution of a process in a computation can be viewed as a sequence of events 
with events across processes ordered by Lamport’s happened- before relation, — [ 18 ]. 
We use lowercase letters e and / to represent events. The happened-before relation 
between any two events e and / can be formally stated as the smallest relation such that 
e — / if and only if e occurs before / in the same process, or e is a send of a message 
and / is a receive of that message, or there exists an event g such that e happened-before 
g and g happened-before /. We represent the set of events as the union of events from 
each process, E = [J i?i, for each 1 < i < n. We define a distributed computation as the 
partially ordered set consisting of the set of events together with the happened-before 
relation and denote it by {E, 

We define a consistent cut of a computation {E, -^) as a subset G C E such that 
f€GAe^f^e€G. We use uppercase letters G, PI, J, and K to represent 
consistent cuts. A consistent cut captures the notion of a reachable global state. We use 
consistent cut and global state interchangeably. We denote the set of consistent cuts of 
any distributed computation (£^, —>^) by C(i?) . It is well-known that the set of consistent 
cuts of any distributed computation (E,^) forms a distributive lattice, under the relation 
C [ 19 , 9 ]. We denote this lattice by L = (G{E), C) and also call this as the of 

the distributed computation. For any partially ordered set, we use U and □ to denote join 
and meet operators. Note that the join (resp. meet) of two consistent cuts correspond to 
their union (resp. intersection). We use _L to denote the initial consistent cut, E to denote 
the final consistent cut of all processes, and T to denote a fictitious final cut occurring 
after E. 

We denote the set of maximal (with respect to happened-before relation) elements 
of a consistent cut G by frontier{G). Figure 2 shows a computation and its lattice of 
consistent cuts. A consistent cut in the figure is represented by its frontier. For example, 
the consistent cut {63, 62, Ci, /2, /i, -L} is represented by {63, /2}. A consistent cut H 
is reachable from a consistent cut G iff it is possible to attain P[ from G by executing 
zero or more events. It is easy to see that P[ is reachable from G iff G C p[. We define 
successor of a cut by a relation > C G{E) x G{E) such that G c> iT if and only if 
H = G U {e} for some e G E such that e ^ G. We say that H is a successor of G and 
G is a predecessor of PI. A path Gq, Gi, . . . , G; of (G{E), C) satisfies that for each 
0 P, i I, Gi i> Gj-i-i. 
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Fig. 2. (a) A computation (b) meet-irreducible cuts (c) corresponding lattice of the computation 



A predicate is defined as a boolean-valued function on variables of processes. Given 
a consistent cut, a predicate is evaluated with respect to the values of variables resulting 
after executing all events in the cut. If a predicate p evaluates to true for a consistent 
cut C, we say that “C satisfies p”. We leave the predicate undefined for T. A global 
predicate is local if it depends on variables of a single process. 

We say that a predicate is regular if the set of consistent cuts that satisfy the predicate 
forms a sublattice of the lattice of consistent cuts. Equivalently, if two consistent cuts 
satisfy a regular predicate then the cuts given by their set intersection and set union also 
satisfies the predicate. Let inf{p) and sup{p) denote the least and the greatest consis- 
tent cut that satisfies a given predicate p, respectively. From the definition of a regular 
predicate we deduce that both inf{p) and sup{p) exist for a regular predicate. There 
are efficient algorithms for detecting regular predicates m\Atv possibly and contmllable 
modalities [9,25]. 



4 Polynomial- Space Algorithm 

The performance of algorithms for detecting definitely : p can be improved by consid- 
ering a smaller state-space, that is, a smaller computation than the original computation. 
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In this section, we present a polynomial-time algorithm for reducing the size of the 
computation. We show that detecting definitely : p on the original computation is the 
same as detecting definitely : p on the smaller computation. For this purpose, we define 
an interval of a computation {E, — >■) with respect to consistent cuts C and D as the 
computation interval{C, D), which is a subset of {E, — >■) and only the cuts between C 
and D (including C and D) of {E, — >) belong to interval{C, D). 

We state informally a lemma before presenting our state-space reduction algorithm. 
Given three consistent cuts G, El and J, where El is reachable from J and H is a 
successor of G, the intersection of G and J is either J or it is a predecessor of J. We 
present the proofs in the extended version of this paper [26]. 

Theorem 1 (NSC). Given a computation (£!,—>■), definitely: p holds in (i?, — >■) iff 
definitely: p holds in interval{G, D), where G is the meet of predecessors of inf (p), 
if the predecessors exist, otherwise inf{p), and D is the join of successors of sup{p), if 
the successors exist, otherwise sup{p). 

Proof. Without loss of generality, assume that both G and D exist and are different from 
the initial and final consistent cut of {E, We prove the contrapositives. 

We obtain a path from the initial consistent cut to the final consistent cut in {E, -^) 
as follows: Pick an arbitrary path from the initial consistent cut of {E, -^) to G. We 
know that none of the cuts on this path satisfy p since all cuts that satisfy p belong to 
interval{C, D). Next, using the assumption, continue this arbitrary path with a path in 
interval{C, D) where none of the cuts on the path satisfy p. Finally, pick an arbitrary 
path from D to the final consistent cut of {E, -ff). 

Now we prove that if there exists a path from the initial to the final cut in {E, — >) where 
all cuts on the path satisfy -•p then there exists a path from the initial to the final consistent 
cut in interval{G, D) where all cuts on the path satisfy -•p. We prove the claim in two 
Steps. 

Step 1: We first show that if there exists a path, V, from the initial to the final 
consistent cut in {E, -^) where all cuts on the path satisfy -•p then there exists a path 
from the initial to the final cut in interval{C, E) where all cuts on the path satisfy -ip. 

Let f) be the first cut on the path V such that inf{p) C (3. Let be the predecessor of 
(3 on the path V. From the lemma stated above, the meet of (3' and inf{p) is either inf{p) 
or a predecessor of inf{p), say G' . However, if the meet is inf{p) then inf(p) C (3' . 
Since (3' is also on the path V we have that (3' is the first cut on the path V such that 
inf{p) C [3. This is a contradiction since (3 is the first such cut. Therefore the meet of 
P' and inf{p) is G' . 

There exists a path from C' to P' because C' Q P'. Furthermore every cut on this path 
satisfies -•p. We prove this as follows. From the definition of interval{C, D), only cuts 
in intervapinf (p) , sup{p)) satisfy p. Now consider all cuts F such that C" C F C p' . 
We have that inf{p) 2 P' and therefore inf{p) 2 F. Therefore, P' and all such E do 
not satisfy p. Since G is the meet of all predecessors of inf{p) and C" is a predecessor 
of inf{p), G C G' and therefore G Q E. Also, all cuts from G to C" satisfy -•p since 
none of them belong to interval{inf{p),sup{p)). 
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We obtain the required path as follows. Choose an arbitrary path from C to C", then 
continue the path from C to (3' and then to (3. Continue the path from (3 to the final cut 
with the same path from (3 to the final cut as in path V. 

Step 2: Now we show that if there exists a path, V, from the initial to the final 
consistent cut in interval{C, E) where all cuts on the path satisfy -•p then there exists 
a path from the initial to the final consistent cut in interval{C, D) where all cuts on the 
path satisfy ~^p. 

The proof is similar to Step 1 with the paths reversed. In this case we choose (3 as the 
last cut on the path V such that (3 C sup{p) and f3' as the successor of (3 on the path V. 
Furthermore, we choose D' as a successor of inf{p). We can show in a similar fashion 
as in Step 1 that there exists a path from f3' to D' where all cuts on the path satisfy -ip. 
Finally, we can construct a path from C to Z? as the concatenation of the paths from C 
to f3, (3 to l3', /?' to D', and D' to D. □ 



We can compute interval{C, D) by computing inf{p) and sup{p) in 0{n\E\) time 
for regular p [9]. Similarly, we can compute the predecessors and successors of a cut in 
0{n) time. Note that the above theorem is not restricted to predicates with a single least 
and greatest cut only. For example, if the predicate has several least cuts then first we take 
the intersection of all those cuts; second, we find the predecessors of the intersection; 
and finally, we compute the intersection of the predecessors to obtain C. 

Although the time complexity of computing interval{C , D) is polynomial, the time 
and space complexity of detecting definitely :p on this reduced state-space may be expo- 
nential since interval{C, D) may contain exponential number of global states. However, 
it is always better to work on interval{C, D) rather than {E, — >■) since interval{C, D) 
is a subset of {E, — >■). In fact, we believe that interval{C, D) is generally much smaller 
than the original computation {E, -^) and we validate this belief with experimental work. 
Furthermore, Theorem 1 is orthogonal to the conditions we will present for detecting 
definitely : p, that is, we can always first compute interval{C, D) and then apply those 
conditions. 

Next, we present a polynomial-space algorithm for definitely :p. Cooper and Marzul- 
lo [4] presented a worst case exponential-space and time algorithm when they introduced 
definitely : p. Their algorithm detects definitely : p using level sets where a level set is 
the set of successors of a consistent cut. The algorithm starts from the initial consistent 
cut. If p is true in the initial consistent cut we are done. Otherwise, it constructs the next 
level set including only those consistent cuts in which -ip is true. Continuing in this 
manner, if the algorithm can reach the final consistent cut, then definitely : p is false; 
otherwise, it is true. This algorithm requires space proportional to the size of the largest 
level set, which is exponential. We obtain a simple space efficient algorithm for detecting 
definitely : p by generating all paths of cuts for the given computation. This algorithm is 
based on generating linearizations of a partial order [21]. For each such path, we check 
whether -ip holds on every cut on the path. If such a path exists then definitely: p is 
not satisfied otherwise it is satisfied. The length of every path is at most \E\, the total 
number of events in the system. A frontier of a consistent cut can be represented by an 
n-dimensional vector. Therefore, for each consistent cut 0{n) space is required giving 
us the space complexity of 0{n\E\). The time complexity is bounded by the number of 
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paths, which may be exponential in the number of processes. We can improve the time 
complexity using computation slicing technique explained later in this paper. 

Figure 3 shows a polynomial-space definitely : p algorithm that uses the techniques 
developed in this section. 



Input: A computation {E, — >•) and a predicate p 
Output: definitely : p is satisfied or not 

1 . compute inf{p) and sup{p)\ 

2. let C be the intersection of predecessors of inf{p); 

3. let D be the union of successors of sup{p)\ 

4. use C and D to obtain intervaliC, D)\ // reduce the number of globai states 

5. for each path in interval{C, D) do // obtain paths using [21] 

6. iet G be the first cut on the path 

7. whiie G satisfies -^p do 

8. G := successor of G on the path; 

9. endwhile; 

10. if G = D then // finai cut is reached 

11. return faise; 

12. endif; 

13. endfor; 

14. return true; 



Fig. 3. A polynomial-space algorithm for detecting definitely : p 



5 Polynomial- Time Necessary Conditions 

Now we present a polynomial-time necessary condition to detect definitely : p that uses 
meet-irreducible cuts [5] . We say that a cut is meet-irreducible if it has only one successor 
consistent cut. For example, the predecessors of the final consistent cut of a computation 
(e.g. predecessors of { 63 , /a} in Figure 2(b)) are all meet-irreducible cuts. The number 
of meet-irreducible cuts of a distributive lattice is generally exponentially smaller than 
the number of all cuts in the lattice. In fact, for a finite distributive lattice, the number 
of meet-irreducible cuts is exactly equal to the size of the longest chain in the lattice 
[5]. In our case, the length of the longest chain is equal to the number of events \E\. 
Hence, if some computation can be done on meet-irreducible cuts, we get a significant 
computational advantage. 

Theorem 2 (NC). Given a computation {E, — >■) and a regular predicate p, if ~~p holds 
at the initial consistent cut and at the successor of every meet-irreducible cut then 
definitely: p does not hold in {E, — >■). 

Proof. We show that there exists a path from the initial to the final consistent cut in the 
computation {E, -^) where all cuts on the path satisfy -•p. Given an arbitrary consistent 
cut C that satisfies -•p and different from the final consistent cut, we first show that there 
exists a successor of C that satisfies -ip. There are two cases. 

Case 1 : C has a single successor. In this case C is a meet-irreducible cut and from the 
assumption -ip holds at the successor of C. 
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Case 2: C has at least two successors. Observe that if more than one successor of C 
satisfies p then from the regularity of p, the intersection of those successor cuts, which is 
C, satisfies p. This leads to a contradiction. Therefore, there exists at least one successor 
of C where -ip holds. 

We construct the path as follows: From the assumption, -ip holds at the initial cut. 
From above we have that for every consistent cut that satisfies -ip we can find a successor 
consistent cut that satisfies -ip. Finally, we reach the final consistent cut which is the 
successor of a cut that satisfies -ip. □ 

The converse of Theorem 2 is false. Figure 2(c) displays the lattice of consistent cuts 
of the computation in Figure 2(a). From the lattice we observe that this computation 
satisfies the right side of Theorem 2. Flowever, the left side of the theorem does not hold 
because the successor of the meet-irreducible cut {/a} satisfies p. A similar condition 
can be given for join-irreducible cuts. A join-irreducible cut of a distributive lattice is 
such that it has only one predecessor consistent cut. Meet and join-irreducible cuts are 
duals of each other. 

Theorem 3. Given a computation {E, — >■) and a regular predicate p, if~'P holds at the 
final consistent cut and at the predecessor of every join-irreducible cut then definitely :p 
does not hold in {E, — ^). 

We can check Theorem 2 (resp. Theorem 3) by finding the meet-irreducible (resp. join- 
irreducible) cuts of the computation in 0{n‘^\E\) time for regular p [24]. 

Next we present another polynomial-time condition for detecting definitely :p based 
on the notion of intervals introduced earlier. We say that a predicate is an interval 
predicate if there exists a unique initial cut, C, and a unique final cut, D, that satisfies 
the predicate and the predicate holds in all cuts between C and D. An interval predicate 
with the initial and final cuts C and D defines an interval{C, D) in a computation 
{E, Observe that interval{C, D) may partition the lattice of consistent cuts of a 
computation as in Figure 4. The patterned region in the figure denotes the cuts that belong 
to interval{C, D), i.e., the set of cuts that satisfy the interval predicate. A cut F belongs 
to partition lif C ^ F C D, partition llif C % F % D, partition III if C C F C D, 
and partition IV if C C F ^ Given that interval{C, D) exists, that is, partition III 
exists, other partitions may not exist. For example, if C is the initial consistent cut of 
(F, -^) and D is the final consistent cut of (F, — !>) then only partition III exists. 

Theorem 4. Given a computation (F, — >■) and an interval predicate p with interval (C, 
D), there exists a consistent cut F that belongs to partition II in (F, — >■) iff definitely : p 
does not hold in (F, — >■). 

Proof =^: 

We know that F is reachable from the initial cut. For the purpose of contradiction, assume 
that there exists a cut F on a path from _L to F such that H satisfies p . For F to be 
reachable from H, we must have that H C F. However since H satisfies p,C Q H and 
since F is in partition II, C % F, therefore we have a contradiction. Similarly, we can 
show that there does not exist a cut H' on a path from F to E such that H' satisfies p. C 
cannot be _L and D cannot be F because we assume that partition II exists. Therefore, 
partitions 1 and IV also exist. Now we obtain a path where all cuts satisfy -ip by starting 
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final cut 




Fig. 4. intervaliC , D) partitions the lattice of consistent cuts 



from _L and following an arbitrary path in partition I such that the path reaches F in 
partition II. Then we follow an arbitrary path from F to the final consistent cut. 

4 =: 

We prove by contradiction. Suppose that partition II does not exist and there exists a 
path in {E, -^) from initial to the final consistent cut where all cuts on the path satisfy 
-•p. Since there exists such a path, we have that partitions I and IV exist. Otherwise, 
C = _L and D = E and we do not have a path from _L to E' where -ip holds on the 
path. Since partition II does not exist and a path of cuts satisfying -ip exists, there is a 
path from partition I to partition IV without passing through partition III (since p is an 
interval predicate). We will show that this is impossible. 

Consider two cuts, F and H, on a path from _L to E where -ip holds on the path, 
such that E belongs to partition I and FI belongs to partition IV and H is a successor 
of E. From the definition of partitions, we have that C % F C D and C Q H % D. 
Furthermore, from the definition of successor of a cut, we know that H = F U {e}, 
where e is an event in (E, -^) and e ^ E. To obtain H from E, there are two cases: On 
one hand, we should add e ^ D to F (therefore e ^ C) so that H % D. On the other 
hand, we should add e € C to E (therefore in e G E) so that C C H. However, e G D 
and e ^ D leads to a contradiction. □ 

We present a weaker result for regular predicates. The necessary conditions of Theorem 
2 and 3 are not comparable with the condition of Theorem 5 below. Furthermore, observe 
that the converse of the next condition is false. 

Theorem 5. Given a computation (E, — >■), and a regular predicate p with interval(C, 
D), where C = inf{p) and D = sup{p), if there exists a consistent cut F that belongs 
to partition II in (E, — >■) then definitely : p does not hold in (E, — >). 

We can use a technique called slicing, which we explain next, to detect whether there 
exists a consistent cut E in partition II. The overall complexity of checking the existence 
of E using slicing is 0(n^|Ep) [20]. 

6 Polynomial- Time Sufficient Condition 

We have advocated the use of a technique called computation slicing for predicate detec- 
tion in [9,20,25]. The notion of computation slice is based on Birkhoff’s Representation 
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Mrially x=0, y=0 




Fig. 5. (a) A computation (b) slice wrt x ^ 4(c) lattice of consistent cuts of the computation 



Theorem for Finite Distributive Lattices [5] . The readers who are not familiar with earlier 
papers on slicing [9,20,25] are strongly urged to read the extended version of this paper in 
[26]. We also use a directed graph model of a computation to handle both computations 
and computation slices in a uniform and convenient manner. In this model, a distributed 
computation {E, -^) is a directed graph with vertices as the set of events and edges as 
-A. A subset of vertices forms a consistent cut if the subset contains a vertex only if it 
also contains all its incoming neighbours. Observe that a consistent cut either contains 
all vertices in a strongly connected component or none of them. Roughly speaking, a 
computation slice (or simply a slice) is a concise representation of all those consistent 
cuts of the computation that satisfy the predicate. More precisely. 

Definition 1 (slice [20]). A slice of a computation with respect to a predicate is a directed 
graph with the least number of consistent cuts that contains all consistent cuts of the 
given computation for which the predicate evaluates to true. 

We denote the slice of a computation {E, -^) with respect to a predicate p by 
s\\ce{{E , , p) . It was shown in [20] that the slice exists and is uniquely defined 

for all predicates. Intuitively, the consistent cuts that belong to the slice are obtained by 
computing the union and intersection closure of the cuts in the computation that satisfy 
the predicate. In other words, if two cuts G and E[ satisfy p, then the slice contains cuts 
GU E[ and Gil El too. 

Given a computation as in Figure 5(a), and a regular predicate p, such as {x = 
4), where a; is a local variable defined on process Py, now we consider fhe slice of 
fhe compulation with respect to -•p as displayed in Figure 5(b). The consistent cuts 
that belong to the slice are denoted by white filled circles in Figure 5(c). Note that 
in this example the cuts that belong to the slice are already closed under union and 
intersection. We make the following two observations on the computation and its slice. 
First, consider the cuts in the computation. On every path from the initial to the final 
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Fig. 6. (a) A computation (b) corresponding lattice 



consistent cut there is a consistent cut that contains event C2 but not 63. These cuts are 
{e2) /i}, {e2, /2}, {e2, /a}- Furthermore, all of these cuts satisfy p. Second, consider 
the cuts in the slice, when the slice contains a non-trivial strongly connected component, 
such as {e2, 63} in Figure 5 (b), then none of the cuts of the original computation that 
contain a single element from this component belongs to the slice. For example, cuts 
that contain only 62 but not 63 do not belong to the slice. 

From the above two observations, if the slice for -ip contains a non-trivial strongly 
connected component then in the computation, on every path from the initial to the final 
consistent cut, there exists a consistent cut that satisfies p which does not belong to the 
slice. Therefore, definitely : p holds. We can use these observations to state a sufficient 
condition for detecting definitely : p. 

Theorem 6 (SC). Given a computation {E, — ^) and a regular predicate p, if 
slice{{E , , ->p) contains a non-trivial strongly connected component then 

definitely : p holds in {E, — ^). 

We can check this condition by finding the slice in 0 {n'^\E\'^) time [ 20 ] and then 
checking the strongly connected components of the slice in 0 {n\E\) time [ 20 ]. 

The converse of Theorem 6 is false. Figure 6(a) displays a computation that satisfies 
definitely: p. When we compute the union and intersection closure of the cuts that 
satisfy the predicate (the closure of white filled circles), we obtain the set of consistent 
cuts that belongs to the computation, that is, slice{{E , , -^p) has the same set of 

cuts as {E, Therefore, the slice does not contain a non-trivial strongly connected 
component not in {E, 

Another advantage of slicing is that we can use the slice with respect to -ip instead 
of the computation to obtain a smaller number of linearizations for the first polynomial- 
space algorithm explained in Section 4 . 

7 Experimental Results 

We implemented the conditions in this paper in POTA and applied it to a leader election 
protocol and the General Inter-Orb Protocol (GIOP). 
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Fig. 7. Leader Election verification results; SPIN runs out of memory for >11 processes 



The leader election protocol [8] implements the Chang-Roberts algorithm where 
processes are arranged in a unidirectional ring. We check 

definitely: {done^ A donei A ... A done„_i) 

which denotes that eventually a leader is chosen by every process. 

The General Inter-ORB Protocol (GIOP) [16] is the abstract protocol which is used 
for communications between CORBA ORBs. It specifies the transfer syntax and a stan- 
dard set of message formats for ORB interoperation over any connection-oriented trans- 
port Protocol. GIOP is designed to be simple and easy to implement, while still allowing 
for reasonable scalability and performance. We check 

definitely :(U RequestSenty U Reply Received^ C RequestSenty C Reply Received) 

which denotes that a process is always finally in one of its local states. 

In order to evaluate the effectiveness of our conditions, we compare our approach 
with a partial order reduction based model checker SPIN [13]. For this purpose, we used 
the translator from execution traces to Promela (input language of SPIN) implemented 
in POTA. We restricted the memory usage to 512MB. We manually instrumented the 
programs. The computations are obtained by running the program for 20 seconds. Our 
results are shown in Figure 7 and Figure 8. The POTA line denotes experiments performed 
by applying all three Theorems 1, 2, 6. 

For GIOP protocol, SPIN took 134s and 278.9MB for 10 processes. SPIN ran out 
of memory for > 10 processes. Observe that our improvement in space and time per- 
formance is in the order of magnitude. The conditions in Theorems 2 and 6 allow us 
to obtain big performance gains as illustrated for GIOP protocol. However, even when 
these conditions do not hold, for leader election protocol, we still obtain improvement 
in space and time performance although not in number of processes. 

Due to lack of space further experimental results are not reported here but these 
results and their detailed explanations are available at POTA website [23]. 
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Fig. 8. GIOP verification results: SPIN runs out of memory for > 10 processes 



8 Conclusion 

We presented space and time efficient algorithms for testing programs with respect to 
definitely: p predicates. Earlier, we developed polynomial time detection algorithms 
in POTA for predicates from a subset of the temporal logic CTL that did not include 
definitely modality. We can enlarge this subset by the results of this paper. 
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Abstract. Most efforts to combine formal methods and software testing go in 
the direction of exploiting formal methods to solve testing problems, most com- 
monly test case generation. Here we take the reverse viewpoint and show how 
the technique of partition testing can be used to improve a formal proof technique 
(induction for correctness of loops). We first compute a partition of the domain of 
the induction variable, based on the branch predicates in the program code of the 
loop we wish to prove. Based on this partition we derive a partitioned induction 
mle, which is (hopefully) easier to use than the standard induction rule. In par- 
ticular, with an induction rule that is tailored to the program to be verified, less 
user interaction can be expected to be required in the proof. We demonstrate with 
a number of examples the practical efficiency of our method. 



1 Introduction 

Testing and formal verification at first glance seem to be at opposing ends in the spec- 
trum of techniques for software quality assurance. Testing is a core technique used by 
practitioners every day, while formal verification is difficult to master, and employed 
mostly by specialists in academia. Most practitioners agree that formal verification is 
too cumbersome and difficult to be useful in practice. On the other hand, testing cannot 
be used on its own to prove the absence of errors, because exhaustive testing is usually 
impossible. In practice, one stops testing once the number of found errors drops below 
a certain threshold (or simply when the testing budget is used up). Formal verification, 
although costly, can ensure that a program meets its (formal) specification for any in- 
put. Given this state of affairs, it might seem surprising that testing and verification can 
fruitfully interact - nevertheless, this is what we want to show in the present paper. 

There is one fairly established connection between formal methods and testing (docu- 
mented, for example, in several papers collected in this proceedings): test case generation 
from formal specifications. The presence of a formal specification can also solve the or- 
acle problem. One obstacle of this approach is that availability of a formal specification 
is the exception rather than the rule. On the other hand, if the cost for providing a formal 
specification has been invested already, one can use it as a basis not only for testing, 
but even for formal source code verification. The contribution of this paper is to show 
that techniques from testing can considerably simplify the verification effort. Hence, 
the availability of a formal specification is doubly useful: on the one hand, with by now 
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established techniques one can generate test cases automatically. In addition, as we show 
below, by employing techniques from testing, even formal verification may come into 
reach. We see our work as a first step towards a framework, where both testing and 
verihcation can he usefully combined. 

Partition testing is a software testing technique used to systematically reduce test 
volume. A program’s possibly infinite input space is divided into a finite number of dis- 
joint subdomains. Testing is done by picking one or more elements from each subdomain 
to form a test set that is somehow representative for the program behaviour. Ideally, all 
elements in a subdomain behave in the same way with respect to the specihcation, that 
is, they are all processed correctly or they are all processed incorrectly. Subdomains with 
this property are called revealing [1] or homogeneous [2]. 

There is a line of work in software testing theory [3,4, 1,5,2], where it is shown that 
testing can be used to show the absence of errors provided that certain properties in 
the test case selection are fullfilled. In the context of partition testing, the sought-after 
property is that subdomains are revealing. Unfortunately, establishing this property in 
practice means usually to give a formal correctness proof (for each subdomain). Hence, 
given the difficulties of general theorem proving, this was often discarded as impractical. 
Our results may be considered as a step towards obtaining such correctness proofs 
practically, because it suggests that proving correctness for each subdomain separately 
requires less user interaction than giving a proof simulteanously for the entire domain 
(as usually done in theorem proving). 

In a nutshell, here is what we do: the implementation basis for our work is a soft- 
ware verihcation system for the programming language Java Card called KeY [6]. The 
verihcation paradigm of KeY is to execute programs with symbolic values, which then 
are checked (symbolically) against the formal specihcation. More exactly, KeY is based 
on a hrst order dynamic logic with arithmetic [7,8]. It uses a sound and relatively com- 
plete calculus which contains rules mimicking symbolic execution. This idea was hrst 
presented in [9] and formalized in [10,1 1]. 

The main obstacle in automating software verihcation to an acceptable degree is 
the handling of programs with loops or recursive methods. These constructs require 
induction on one of the inductive data structures occurring in the program (for example, 
numbers or lists). The difficulty is to hud a suitable induction hypothesis. This can be a 
formidable challenge even for formal methods experts. The complexity of the induction, 
of course, depends on the complexity of the loop or method body and post condition at 
hand. In simple cases, the induction can be performed automatically. Therefore, it would 
be extremely benehcial to simplify the required induction hypotheses. The key insight 
that we work out in the present paper is that the technique of partition testing is in fact 
a fairly general and automatic divide-and-conquer concept that can be used to simplify 
inductions in formal verihcation proofs. 

Roughly speaking, to verify a loop, we use a white-box partition analysis based on 
the branch predicates of its body and condition, to compute a partition of the domain of 
the induction variable. This partition is then used to derive (mechanically) an induction 
rule which takes the partition into account: let us call the standard induction rule for 
natural numbers the rule that allows to conclude that a statement (j>{n) holds for all n G N 
provided that it holds for the single base case (“(^(0)”) and for the single step case (“for 
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any i, if 4>{i) then (pii + 1)”). This is replaced by an induction rule that has m base cases 
and r step cases, each of which matches a subdomain of the partition and, hopefully, 
needs much less user interaction. 

Other work that is related to ours can be found elsewhere. For instance, there was an 
early effort [12] to use test data to aid in proving program correctness. In contrast to this 
approach, we do not actually run any tests, but our approach relies on a test case gener- 
ation technique (partition analysis). Also, there is a recent runtime analysis technique to 
generate invariants inductively from test cases, presented in [13]. For higher order func- 
tional programming languages, [14] describes how to formally derive induction schema 
for recursively defined functions. However, our work has the advantage to be applicable 
to a real object-oriented programming language, Java Card. 

The remainder of the paper is organized as follows. We start with a motivating 
example in Sect. 2. In Sect. 3 we describe the method. Then we show it at work. First, 
we revisit the introductory example (Sect. 4.1), followed by a more sophisticated problem 
(Sect. 4.2). We close by pointing out current limitations (and, hence, future work). 

2 Motivating Example 

In this section we describe a simple example of a loop that is not possible to prove 
(without complex user interaction) using a standard induction rule, but is easy with 
our approach. The description here is brief. Our method is explained in detailed in the 
following section. Here is the Java Card code of the loop; 

int final c= ... ; 

int i; 



while (i > 0) { 
if (i >= c) { 
i = i — c; 

} else { 



For this while-loop to terminate in a state where i = 0 we need in the precondition that 
i > 0 and c > 1. c is constant. In dynamic logic (briefly DL - the essentials of our 
logical framework are described in Sect. 4.1) the proof obligation is Vi ■ 4>{i), where 
4>{i) is: 

i > 0 Ac > 1 
( while (i > 0) { 
if (i >= c) { 
i = i — c; 

} else { 
i — ; 

} 

})i = 0 
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The formula contains a total correctness assertion: the program within the brackets ( ) 
(here the code of the while-loop) terminates and in the final state the postcondition 
following the brackets must hold (here i = 0). 

The simplest possible choice for the induction hypothesis when proving correctness 
of the loop is to take 4>{n). It is completely schematic and requires no interaction with 
the user. This hypothesis, however, is too weak when using the standard induction rule. 
Roughly speaking, in a proof attempt of the standard step case, Vn G — >■ (j>{n+l), 

the following happens: the while-loop is unwound forn-f 1 and the proof branches at the 
if-statement. One case (the one with “i - - ; ”) is possible to prove, because “ ( n+ 1 ) - - ; ” 
is equal to n after symbolic execution. The proof obligation for this case simplifies to 
Vn G N-^(n)An < c —>■ (^(n), which is valid. In the other case symbolic execution gives 
n -f 1 — c so that the resulting proof obligation Vn G N • <^(n) A n > c — >■ (j>{n -f 1 — c) is 
in general unprovable. With standard induction, a more powerful induction hypothesis 
must to be found - a difficult task for a user with no training in formal methods ! 

In our approach we instead create mechanically a new, partitioned induction rule. 
For our example loop the partitioned induction rule has two base cases and one step case: 



m 




(1) 


</.(!) A- 


• • A 4>{c — 1) 


(2) 


Vn G N' 


• </)(n) -G 4>{n + c) 


(3) 



These are constructed from a branch coverage partition of the induction variable i. For 
instance (2) above corresponds to the subdomain with all values of i causing the “else” 
branch inside the loop to be executed. The creation of the partitioned induction rule for 
this particular example is described in more detail in Sect. 4. 1 . Note that this partitioned 
induction rule is powerful enough to make the proof go through automatically with the 
unchanged induction hypothesis (j>{n) that is just what we desire in our effort to minimise 
the user interaction. 

3 Computing Partitioned Induction Rules 

The idea is to create for each loop that we want to prove a new, tailor-made induction 
rule based on partitions. A partition is used, through the new induction rule, to divide 
the proof into smaller and hopefully simpler (in terms of user interaction) parts. Here is 
an overview of the method: 

1 . Compute a partition based on the branch predicates in the program code. We employ 
techniques readily available in the software testing community. Details are out of 
the scope of this paper, but the approach we use here is similar to the construction 
of the implementation partition in [5]. 

2. Refine this partition, thereby making use of the implicit case distinction contained 
in operators (such as mod or -G) that occur in branch predicates. The goal of the 
partition refinement is to arrive at subdomains of a syntactic form that is suitable for 
generation of the new induction rule. 

3. Based on the refined partition, create a new (program-specific) induction rule with 
one base case for each finite subdomain of the partition and one step case for each 
infinite subdomain. 




34 



Reiner Hahnle and Angela Wallenburg 



4. Prove correctness of the loop as usual, but use the new induction rule. This requires 
typically less user interaction than with the standard induction rule. 

Now to the details. Specihcally, assume that we have a program loop with input domain, 
or in our case, a domain of the variable that we want to perform induction over: Z? C N. 
From a partition analysis as described in step 1 above we obtain a hnite number of 
disjoint subdomains, say, Z? = Z?i U • • • U Dm- Let di be the characteristic predicate for 
each i £ 1, . . . , m with x £ Di iff di{x) holds. Hence, x G ZZ iff (Zi(a;) V • • • V dm{x). 
The di are called branch predicates. 

The branch predicates originate from the branching conditions in the program code 
and might contain operators defined by case distinction, for instance, 4-, mod, and >. 
These implicit case distinctions drive further partitioning. 

For each such operator, if necessary, we create a partition such that each case distinc- 
tion in the definition of the operator gives rise to a new subdomain. In the future, we plan 
to create a library of partitions for all operators that occur in Java CARoexpressions and 
the standard Java Card API so that refining partitions can be looked up mechanically. In 
general, we strive to rehne the original partition to obtain new subdomains of a particular 
syntactic form: 

1. {} (the empty set) 

2. A hnite set {xi , . . . , x^}. Such a set is important to distinguish because it can quite 
simply be used as a base case in the new induction rule. 

3. An inhnite set of the form |Ax./(x) |x £ C}, where C C N. It is important 
that /(x) always increases its argument, because it is eventually to be used as an 
induction step in our new induction rule. We use Ax./(x) because we aim for an 
expression (and not the value of a function) to describe a set of values that we want 
to perform induction over. 

Here is an example of a partition rehnement based on an operator dehnition by case 
distinction: in the example from Sect. 2 one of the branch conditions contains the operator 
>, which has an implicit case distinction. We use the following dehnition of >: 

_ J true if 3j/ G N • (x = z + y) 

^ ^ y false if 3j/ G N • (x = z — 1 — j/) 

Each case gives directly a simple expression of the desired form: An. (c+ n) , respectively, 
An.(c — 1 — n) would be used to rehne a subdomain dehned by the predicate i > c. 

More precisely, we rehne the subdomain of the original partition ({i G N | i > c}) 
into two new subdomains by replacing i with (c + n) and (c — 1 — n) respectively. We 
then get{c— 1 — n|nGNA c— 1 — n>c} = {} and {c + n|nGNA c + n>c} = 
{c + n I n G N}, which are both of the form required above. The latter can be used to 
derive an induction step case. 

Assume now that we have a rehned partition of the syntactic form detailed above, 
where operators with implicit case distinctions are eliminated. 

We create a new induction rule with the following set of proof obligations: 

1 . For each non-empty hnite subdomain {xi , . . . , Xfc }, we create a base case consisting 
of the proof obligation 

(j){xi) A • • • A fixk) 
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2. For each infinite subdomain Di a new step case needs to be proven: 

\/n€Ci- 4>{n) 4>{{\x.fi{x))n) 

For the new induction rule to be sound it is important that some criteria are fulblled: 

1. For each step case of the form \/n G Ci ■ 4>{n) — >■ 4>{{Xx.fi{x))n) the following 
holds: 

'inG Ci- fi{n) > n 

This is to ensure that it really is a step, and it is achieved by constructing fi (x) such 
that it increases its argument. 

2. Each element of the domain D of the induction variable is covered in at least one 
of the step or base cases. Let B be the union of all finite subdomains giving rise to 
a base case and let /i, . . . , /^ be the functions that dehne the step cases. Then we 
require 



yx G D ■ {3k G {I, . . . ,r} ■ 3y G Ck ■ X = fk{y)) ^ x G B 

This property is guaranteed by construction, because the partition property is invari- 
ant in the process; we only refine partitions or do not change them at all. 

The hrst property entails that the minimal element of D cannot be in the subdomain 
defined by any step case. The second property says that all elements of D is in either a 
step case or a base case. As a consequence, there must be at least one base case of the 
induction. 

4 Examples 

4.1 Simple Example Revisited 

Now we return to the motivating example from Sect. 2 and show how we actually 
computed the partitioned induction rule for it. 

In KeY, the logical infrastructure is Java Card DL [ 8 ], an extension of dynamic logic 
(DL) [7] to handle side effects, aliasing, exceptions and other complications of a real 
object-oriented programming language such as Java Card. In DL, a formula 93 — (p) t/> 
is valid if for every state s satisfying precondition ip a run of the program p starting in s 
terminates, and in the terminating state the post-condition ij} holds. The proof obligation 
from Sect. 2, that was written using pure DL, is slightly more complicated in Java 
Card DL. Our proof obligation is V z; • 4>{ii). Let 4>{ii) be the following formula: 

Z; > 0 A C; > 1 — >■ 

{i := ii}{c ■= Cl} { while (i > 0) { 
if (i >= c) { 
i = i — c; 

} else { 
i — ; 

} 

})i = 0 
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The curly brackets in front of the formula are called (state) updates. Updates are the 
Java Card DL solution to deal with aliasing and assignment in the calculus. They are 
basically primitive assignments of the form {loc := val} where val must be a logical 
(side effect free) term and loc a program variable. 

To prove correctness of the loop we need to perform induction on the variable i. In 
Java Card DL one cannot quantify over program variables, so the induction variable is 
a corresponding logical variable ii. The domain of the induction variable is N. From the 
branching conditions in the program we obtain the first partition of i’s domain: 

Di = {a; G N I di{x)} = {a; G N | x < 0} = 

= { 0 } 

£*2 = {a; G N I d 2 {x)} = {xGN|x> 0 Ax<c} = 

= {l,...,c- 1} 

D3 = {x G N I d 3 {x)} = {xGN|x> 0 Ax>c} = 

= {x G N I X > c} 

The subdomains Di = { 0 } and D2 = {l,...,c— 1 } are finite and thus already in one 
of our desired formats. 

Then, to refine/rewrite the original subdomain ZJ3, remember from Sect. 3 that for 
the operator x > z,we may use the expressions Xy.z + y and Xy.z — 1 — y to refine a 
subdomain. This gives a refinement of ZJ3 = {x G N | x > c} into two new subdomains 
D3 = D31 U D32, where 

^31 = {replace x in D3 with c + y) 

= {c + j/|i/GNA c + y > c} 

= {c + y\y£ N} 

D32 = {replace x in D3 with c — 1 — y) 

= {c— 1 — y|yGNA c — I — y > c} 

= {} 

So the new subdomains Di, D2 and D31 are of the form we need to construct the 
new induction rule. To prove Vn G N • (j>{n), it is then enough to prove 



m 




(1) 


</'(!) A- 


• • A (j){c — 1 ) 


(2) 


Vn G N' 


• (j){n) — >■ 4 >{c + n) 


( 3 ) 



where ( 1 ) is a base case and covers Di, ( 2 ) is also a base case and covers D2, and ( 3 ) is 
a step case that covers all elements in the subdomain D31. 

The proving process in KeY is partially automated, though it is an interactive theorem 
proven When using the partitioned induction rule above, the following kinds of user 
interaction are required to complete the proof: 
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Instantiation means a single quantifier elimination by supplying a suitable instance 
term. In the KeY system, the user can simply drag-and-drop the desired term. 
Induction rule application: when applying the partitioned induction rule, one can state 
the induction hypothesis by drag-and-dropping the existing proof obligation, and 
then pick the induction variable. 

Unwinding of the loop needs to be initiated, but is done automatically. 

Decision procedure is an automatic procedure that tries to decide the validity of arith- 
metic expressions over the integers. The decision procedure is sound but not com- 
plete. The user decides when (if) to run it. 

Compared to the user interaction needed when using standard induction, this is less 
complicated. Using standard induction, if one uses the unmodified induction hypothesis 
and the same induction variable as above, one is left with an open proof goal and no 
rules to apply: one has to figure out a strong enough induction hypothesis. 

4.2 Russian Multiplication Example 

Let us see how the method works for proving the correctness of a more complicated 
algorithm - russian multiplication. The loop has more complicated control flow than in 
the previous example. 

int russianMultiplication (int a, int b) { 
int z = 0; 
while (a != 0) { 
if (a mod 2 != 0) { 
z = z -H b; 

} 

a = a/2; 
b = b*2; 

} 

return z; 

} 

For this loop we have the precondition oq > 0 and the post-condition z = zq + ag * bo, 
where ao,bo, Zq are the values of a, b and z before the loop. In Java Card DL the proof 
obligation for the total correctness of this loop is Voq • 4>{ao), where 4>{ao) is 



V&0 • V^o • ao > 0 — >■ 

{a := ao}{b := 6q}{z := Zo}{ while (a ! = 0) { 

if (a mod 2 ! = 
z = z -f b; 



} 

a = a / 2; 



0 ){ 



b = b * 2; 

}) Z = Zg + ag * bg 



where ag, bg and zg are new logical variables. 
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This cannot be proven using standard induction unless the induction hypothesis is 
strengthened in a non-trivial way. In an attempt to prove the standard step case using 
4>{a) as the induction hypothesis in Va G N • 0(a) — >■ 0(a + 1), after unwinding and 
symbolically executing the loop we end up with Va G N • 0(a) — >■ 0((a + 1) /2) which 
is unprovable without induction. 

Now let us compute a partitioned induction rule for this loop instead. The induction 
variable is a (the corresponding logical variable is ag). Its domain is N and the first 
partitioning, using branch predicates, gives the following subdomains: 



Di 


= {x G 


N 


di{x)} = 


: {a; G N a; = 




= {0} 








D2 


= {x G 


N 


d 2 {x)} 






= {x G 


N 


\x Ax mod 2^0} 


D3 


= {x G 


N 


IMx)} 






= {x G 


N 


a; ^ 0 A a; mod 2 = 0} 



In words, we have the singleton set containing zero and the sets with the odd and (non- 
zero) even numbers respectively. 

Consider the branch predicate d3{x) x 0 A x mod 2 = 0, which defines sub- 
domain D3. The definition of (( 3 ( 0 ;) contains an operator with implicit case distinction: 
mod. We look up the definition of mod 2: 



X mod 2 = 



0 if 3?/ G N • (x = 2 * y) 

1 if 3y G N • (x = 2 * y -I- 1) 



Hence, we use the expressions Xy. 2*y and Xy. 2 * j/ -f 1 to refine the original partition. 
Using the case distinction in the definition of x mod 2, gives us the refinement of the 
original subdomain D3 = {a:GN|a;^0Aa; mod 2 = 0} into two new subdomains 
D3 = D31 U U32: 



-D 31 = {replace x in D3 with 2 *y) 

= {2 * j/ 1 y G N A (2 * j/) ^ 0 A (2 * y) mod 2 = 0} 

= {2*y|yGNAy^0A0 = 0 
= {2 * y I y G Ni} 

D32 = {replace x in D3 with 2 * y -f 1) 

= {2*y-|-l|yGNA2*y-|-1^0A(2*y-|-l) mod 2 = 0} 
= {2*y-|-l|yGNAl = 0} 

= {} 

Similarly, for the branch predicate d.2 {x) of the original partition, we get 

£>21 = {2*y-|-l|yGN} 

D22 = {} 



After refinement, we have non-empty subdomains of the form: 
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= { 0 } 

£>21 = {2*y+l|j/GN} 

£>31 = { 2 *y\y G Ni} 

Thus, the new subdomains have the syntactic form we need to construct an induction 
rule. With this rule, to prove Vn G N • </>(n), it is enough to prove that 







(1) 


VnGNi 


• 4 >{n) -G (j >{2 * n) 


(2) 


VnGN- 


4 >{n) -G (j ){2 * n + 1 ) 


( 3 ) 



where (1) is the base case, covering Di, (2) the first step case, covering all elements in 
the subdomain £>31 and ( 3 ) the second step case, covering £>21. 

To prove the russian multiplication algorithm in KeY with the partitioned induction 
rule the required user interactions are basically the same as in the previous example. In 
particular, the induction now goes through completely unmodified. 

5 Limitations and Future Work 

In this paper we demonstrated that the technique of partition testing can be turned into a 
divide-and-conquer concept to simplify inductions in formal verification proofs. We have 
defined a syntactic framework that allows us to derive tailor-made induction rules based 
on partitions in a practically efficient manner. Resulting induction rules are sound and 
complete by construction. The actual verification in KeY using the partitioned induction 
rules can often be performed automatically. Several examples were carried out not just 
by hand, but as concrete experiments in an interactive theorem proven The experimental 
findings confirmed our conjecture. We think that our work is a first step towards a 
framework, where both testing and formal verification can be usefully combined. 

In the current setting, our method has a number of limitations but its reach could be 
extended considerably. For a start, we considered induction not over arbitrary inductive 
data structures, but only the natural numbers. Future work is to extend our approach to 
also include induction over lists, trees, etc. 

Our focus has been entirely on the verification of loops, and not on arbitrary programs. 
Since loops are usually the major source of complexity in verification, in testing as well 
as in theorem proving, it is here that we expect the largest gain. Still, we also wish to 
investigate the idea of partitioning proofs for loop-free programs, since it has been seen 
[ 15 ] that in the case of very large proof obligations, it is beneficial to split the proof into 
parts which can be handled separately. 

Clearly, not all induction proofs can be simplified with our approach. The crucial 
point is that our method requires that the branch predicates somehow capture what is 
being computed in the corresponding branch. This is often the case, but not always. If the 
branch predicates are completely unrelated to the induction variable, we simply get no 
information from the branch predicates on how to partition the domain of the induction 
variable. For instance in array-sorting algorithms, it is common that the induction goes 
over the indexes of the array, but the branch predicates typically have a comparison 
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between the elements to be sorted and these might be randomly ordered. In future work 
we plan to remedy this by also using the weakest preconditions of updates to induction 
variables when we refine the partition. In KeY there is already a strongest postcondition 
generator. 

Finally, the process of transforming general branch predicates into predicates of the 
form that our method requires (using A-expressions) is non- trivial; in particular for the 
process to be mechanised. In our examples, quite simple branch conditions occurred. It 
is future work to investigate what exactly can be done mechanically. It includes dealing 
with predicates containing arbitrary linear operators and method calls, but quadratic 
operators and operators like sin, we expect to be beyond reach. However, our method 
is conservative in the sense that if it does not find a useful refinement of a partition for 
a certain subdomain, the subdomain stays the same. In that case the proof will not be 
simplified, but it will not be more complicated either. 
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Abstract. Use of model-checking approaches for test generation from 
requirement models have been proposed by several researchers. These ap- 
proaches leverage the witness (or counter-example) generation capability 
of model-checkers for constructing test cases. Test criteria are expressed 
as temporal properties. Witness traces generated for these properties 
are instantiated to create complete test sequences, satisfying the crite- 
ria. State-space explosion can, however, adversely impact model-checking 
and hence such test generation. Thus, there is a need to validate these 
approaches against realistic industrial sized system models to learn how 
well these approaches scale. To this end, we conducted a case study us- 
ing six models of progressively increasing complexity of the mode-logic 
in a flight-guidance system, written in the RSML“® language. We de- 
veloped a framework for specification-based test generation using the 
NuSMV model-checker and code based test case generation using Java 
Pathfinder, and collected time and resource usage data for generating 
test cases using symbolic, bounded, and explicit state model-checking 
algorithms. This paper briefly discusses the approach, presents the re- 
sults from the study and analyzes its implications. 



1 Introduction 

Software development for high assurance systems, such as the software control- 
ling aeronautics applications and medical devices, is a costly and time consuming 
process. In such projects, the validation and verification phase (V&V) consume 
approximately 50%-70% of the software development resources. Thus, automatic 
generation of test cases from requirement specifications has found considerable 
interest in the research community. Such automation could result in dramatic 
time and cost savings, especially for verifying safety-critical systems. 

* This work has been partially supported by NASA grant NAG-1-224 and NASA 
contract NCC-01-001. We also want to thank the McKnight Foundation for their 
generous support over the years. 
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Model checking techniques have been proposed as one method of achieving 
this automation [3,9,2,10,20,16]. These proposed test case generation approaches 
leverage the witness (or counter-example) generation capability of 
model-checkers for constructing test cases. Test criteria are expressed as tempo- 
ral properties. Witness traces generated for these properties are instantiated to 
create complete test sequences, satisfying the criteria. Nevertheless, one of the 
issues that often stymies model-checking is the state-space explosion problem. 
As the size of the state-space to be explored increases, model-checking might 
become too time-consuming or infeasible. But in the context of test generation 
based on structural properties, one is interested in falsifying properties so that 
counter-examples can be instantiated to test sequences. We have hypothesized 
that finding violations of the properties characterizing a test case is easy and 
that the counter-examples can be constructed easily even for large models. 

While these ideas are appealing there is a need to validate the approach us- 
ing realistic models of critical systems. To this end, we conducted a case study 
using six models of progressively increasing complexity of the mode-logic in 
a flight-guidance system, written in the RSML“® language [22,23]. We devel- 
oped a framework for specification-based test generation using the NuSMV [19] 
model-checker and code based test case generation using Java Pathfinder [25] 
and collected time and resource usage data for generating test cases using sym- 
bolic, bounded, and explicit state model-checking algorithms. The purpose of 
this study was to determine if a model-checking based approach to test genera- 
tion could scale to software system models of industrial size and complexity. 

To summarize our findings, our case study points out limitations of symbolic 
as well as explicit state model checkers when used for test case generation. A 
bounded model checker, however, performed very well in our application domain 
and shows great promise. 

The rest of the paper is organized as follows. Section 2 provides a short 
overview of related efforts in the area of test-generation using model checking 
techniques and briefly describes our overall approach. We describe how we con- 
ducted our case study in Section 3, and present the FGS case example in Sec- 
tion 4. Sections 5 and 6 briefly discuss RSML“® and the test coverage criteria 
used for this study. Section 7 analyzes the results obtained from our experiments 
with the RSML“® specification language. In Section 8 we cover our experiences 
with an explicit state model checker applied to Java code. Finally, Section 9 
discusses the implications of the results and points to future studies and experi- 
ments that are further required to validate model-checking based test generation 
approaches. 

2 Finding Tests with a Model Checker 

Model checkers build a finite state transition system and exhaustively explore 
the reachable state space searching for violations of the properties under in- 
vestigation [7]. Should a property violation be detected, the model checker will 
produce a counter-example illustrating how this violation can take place. In 
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Fig. 1. Test sequence generation overview and architecture 



short, a counter-example is a sequence of inputs that will take the finite state 
model from its initial state to a state where the violation occurs. 

A model checker can be used to find test cases by formulating a test criterion 
as a verification condition for the model checker. For example, we may want 
to test a transition (guarded with condition C) between states A and B in the 
formal model. We can formulate a condition describing a test case testing this 
transition - the sequence of inputs must take the model to state A; in state A, 
C must be true, and the next state must be B. This is a property expressible 
in the logics used in common model checkers, for example, the logic LTL. We 
can now challenge the model checker to find a way of getting to such a state by 
negating the property (saying that we assert that there is no such input sequence) 
and start verification. The model checker will now search for a counterexample 
demonstrating that this property is, in fact, satisfiable; such a counterexample 
constitutes a test case that will exercise the transition of interest. By repeating 
this process for each transition in the formal model, we use the model checker to 
automatically derive test sequences that will give us transition coverage of the 
model. The proposed test generation process is outlined in Figure 1. Naturally, 
the same thinking can be applied to the generation of test cases from source 
code, for example, from Java as we will illustrate later in the paper. 

Several research groups are actively pursuing model checking techniques as 
a means for test case generation. 

Gargantini and Heitmeyer [10] describe a method for generating test se- 
quences from requirements specified in the SCR notation. To derive a test se- 
quence, a trap property is defined which violates some known property of the 
specification. In their work, they define trap properties that exercise each case 
in the event and condition tables available in SCR - this provides a notion of 
branch coverage of an SCR specification. 

Ammann and Black [2,1] combine mutation analysis with model-checking 
based test case generation. They define a specification based coverage metric 
for test suites using the ratio of the number of mutants killed by the test suite 
to the total number of mutants. Their test generation approach uses a model- 
checker to generate mutation adequate test suites. The mutants are produced by 
systematically applying mutation operators to both the properties specifications 
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and the operational specification, producing respectively, both positive test cases 
which a correct implementation should pass, and negative test cases which a 
correct implementation should fail. 

Rayadurgam, et al. in [20] provide a formalism suitable for structural test- 
case generation using model checkers and in [21] illustrate how this approach can 
be applied to a formal specification language. They also presented a framework 
for specification centered testing in [13]. 

Lee, et al. [16] formulate a theoretical framework for using temporal logic 
to specify data fiow test coverage criteria. They also discuss various techniques 
for reducing the size of the test set generated by the model checker [15]. The 
underlying argument in all these works, as in our own earlier work, is that when 
test criteria can be appropriately formulated as temporal logic formulas, one 
could use model-checking to produce witnesses for those formulas, which could 
then be seen as test sequences satisfying the coverage criteria. 

However, to our knowledge, not much experimental data is available about 
the efficiency of model-checking based test-generation for realistic systems. Our 
goal is to conduct a series of studies using realistic systems, apply the techniques 
and examine how well these techniques perform and to what extent they scale up. 

3 Case Study Overview 

In our case study, we were interested in answering four questions: 

1. If we naively generate one test case for each structure we want to cover, how 
many test cases will be generated for various coverage criteria? 

2. Does test case generation using symbolic and bounded model checking scale 
to realistic systems? 

3. Where do the test case generation capabilities of symbolic and bounded 
model checking break down? 

4. Can a code model checker, such as JPF, be used to find test cases based on 
realistic code? 

To answer these questions, we devised a rigorous case study evaluating the test 
case generation capabilities of model checkers. We have developed a test case gen- 
eration engine integrated in our Nimbus toolset for the development of RSML“® 
specifications [22] (described in Section 5). This test case generator allows us to 
generate test cases to various structural coverage criteria using the NuSMV 
model checker. The coverage criteria we have used in this study are discussed in 
Section 6. 

To stress the capabilities of NuSMV, we wanted to work with models with 
realistic structure as well as realistic size. In a related project, Rockwell Collins 
Inc., in collaboration with the University of Minnesota, have developed a collec- 
tion of progressively more complex RSML“® models of the mode logic of a flight 
guidance system (FGS). The models range from a very simple “toy- version” of 
the FGS (FGSOO) to a close to production version of the logic (FGS05). The 
case example is discussed in some detail in Section 4. 
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Table 1. Data on the size of the RSML ®and SMV FGS models 
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#vars/BDD’s 


19 


109 


27 


167 


43 


281 


57 


353 


90 


615 


142 


849 



We performed the case study by conducting the following steps: 

1. Use Nimbus to automatically generate the LTL properties (trap properties) 
characterizing the test cases needed to satisfy various coverage criteria for 
FGSOO through FGS05. 

2. Use the symbolic as well as bounded model checkers provided in NuSMV to 
generate counterexamples for the suites of trap properties. 

3. Automatically process the counterexamples to provide test cases suitable for 
use in a test automation environment. 

During the case study, we collected information on (1) how many test cases were 
generated, (2) run time and memory usage of the model checkers, and (3) the 
average length of the test cases generated. 

To complete the case study, we investigated the feasibility of using a code 
model checker to complete the test suites derived from the formal specification. 
This capability would be used should the specification-based tests not provide 
adequate coverage of the implementation. To this end, we derived Java code from 
the formal specifications, executed a test suite generated from the specification, 
identified branches that were not covered, and derived tests for these branches 
using Java Pathfinder Java model checker [25]. 

In the remainder of this paper we provide a detailed description of the arti- 
facts and activities involved in the case study. 

4 Flight Guidance System 

A Flight Guidance System (FGS) is a component of the overall Flight Gontrol 
System (FGS) in a commercial aircraft. It compares the measured state of an 
aircraft (position, speed, and altitude) to the desired state and generate pitch and 
roll guidance commands to minimize the difference between the measured and 
desired state^. The FGS can be broken down to mode logic, which determines 
which lateral and vertical modes of operation are active and armed at any given 
time, and the flight control laws that accept information about the aircraft’s 
current and desired state and compute the pitch and roll guidance commands. 
In this case study we have used the mode logic. 

Figure 2 illustrates a graphical view of a FGS in the Nimbus environment. 
The primary modes of interest in the FGS are the horizontal and vertical modes. 
The horizontal modes control the behavior of the aircraft about the longitudinal, 
or roll, axis, while the vertical modes control the behavior of the aircraft about 

^ We thank Dr. Steve Miller and Dr. Alan Tribble of Rockwell Collins Inc. for the 
information on flight control systems and for letting us use the RSML“'’ models 
they have developed using Nimbus. 
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Fig. 2. Flight Guidance System 



the vertical, or pitch, axis. In addition, there are a number of auxiliary modes, 
such as half-bank mode, that control other aspects of the aircraft’s behavior. 

The FGS is ideally suited for test case generation using model checkers since it 
is discrete - the mode logic consists entirely of enumerated and Boolean variables. 
As mentioned earlier, we used six models that are of progressively increasing 
complexity. An indication of the model size can be found in Table 1. The measure 
for the RSML“® models refer to the number of state variables in the RSML“® 
model and measure for the SMV models refers to the number of BDD variables 
needed to encode the model. 

5 Nimbus and RSML"*" 

Figure 3 shows an overview of the Nimbus tools framework we have used as a 
basis for our test case generation engine. The user builds a behavioral model of 
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Fig. 3. Verification Framework 



the system in the fully formal and executable specification language RSML“® 
(see below) . After evaluating the functionality and behavioral correctness of the 
specification using the Nimbus simulator, users can translate the specifications 
to the PVS or NuSMV input languages for verification (or test case generation as 
is the case in this report). The set of LTL trap properties required to use NuSMV 
to generate test sequences are obtained by traversing the abstract syntax tree 
in Nimbus and then outputting sets of properties whose counterexamples will 
provide the correct coverage (the coverage criteria an associated properties are 
discussed in the next section). 

To generate test cases in Nimbus, the user would invoke the following steps: 

Model creation and trap property generation: The formal model in 
NuSMV can be generated automatically from the RSML“® specification from 
the Nimbus command line. The test criterion is specified as a command line 
argument when building the NuSMV model. The result of this command 
is an SMV model of the system and a collection of trap properties whose 
counterexamples will provide the desired coverage. 

Counterexample generation using NuSMV: The model and the trap prop- 
erties are merged and given to the NuSMV tool. A Unix script invokes the 
NuSMV tool in interactive mode, reads the model, flattens the hierarchy, 
encodes the variables, and checks the specifications for the trap properties. 
After completing the script, we have collected the counter example traces 
for all trap properties in a text file. 

Concrete test case generation from NuSMV Output: For any counterex- 
ample, the trace information from NuSMV contains only delta changes in 
each subsequent state following the initial state. Therefore, to generate test 
sequences, we need to remember the value of the variables in the initial state 
configuration so that we can construct usable test cases by applying the 
delta changes to the initial configuration. The processing of the counterex- 
amples and generation of an intermediate test representation is currently 
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achieved with a simple piece of software implemented in C. The intermedi- 
ate test representation contains (1) the input in each step, (2) the expected 
state changes (to state variables internal to the RSML“'^ model), and (3) 
the expected outputs (if any). 

The Nimbus tools discussed above all operate on the RSML“® notation - 
RSML“® is based on the Statecharts [12] like language Requirements State Ma- 
chine Language (RSML) [18]. RSML“® is a fully formal and synchronous data- 
flow language without any internal broadcast events (the absence of events is 
indicated by the “®). 

An RSML“® specification consists of a collection of input variables, state 
variables, input/output interfaces, functions, macros, and constants; input vari- 
ables are used to record the values observed in the environment, state variables 
are organized in a hierarchical fashion and are used to model various states of the 
control model, interfaces act as communication gateways to the external environ- 
ment, and functions and macros encapsulate computations providing increased 
readability and ease of use. 

Figure 4 shows a specification fragment of an RSML“® specification of the 
Flight Guidance System^. The figure shows the definition of a state variable, 
ROLL. ROLL is the default lateral mode in the FGS mode logic. 

The conditions under which the state variable changes value are defined in 
the TRANSITION clauses in the definition. The condition tables are encoded in 
the macros, SeIect_ROLL and DeseIect_ROLL. The tables are adopted from the 
original RSML notation - each column of truth values represents a conjunction 
of the propositions in the leftmost column (F represents the negation of the 
proposition and a represents a ’’don’t care” condition). If a table contains 
several columns, we take the disjunction of the columns; thus, the table is a way 
of expressing conditions in a disjunctive normal form. 

6 Coverage Criteria 

For the case study described in this report, we have selected to use three repre- 
sentative specification coverage criteria; state coverage, decision coverage (in the 
RSML“® context called table coverage), and a version of MG/DG coverage [4] 
called clause-wise condition coverage. 

In the following discussion, a test case is to be understood as a sequence 
of values for the input variables in an RSML“® specification. This sequence of 
inputs will guide the RSML“® specification from its initial state to the structural 
element, for example, a transition, the test cases was designed to cover. A test 
suite is simply a set of such test cases. As we briefly explained, trap properties 
are used to generate counter-examples using a model checker. These properties 
are derived from the structural coverage criteria. For the purposes of illustration, 
we use the FGS example discussed in Section 4. 

^ We use here the ASCII version of RSML“® since it is much more compact than the 
more readable typeset version. 
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STATE_VARIABLE ROLL : Base_State 
PARENT : Modes . On 

INITIAL_VALUE : UNDEFINED 
CLASSIFICATION : State 

TRANSITION UNDEFINED TO Cleared IF NOT Select_ROLL() 
TRANSITION UNDEFINED TO Selected IF Select_ROLL() 
TRANSITION Cleared TO Selected IF Select_ROLL() 
TRANSITION Selected TO Cleared IF Deselect_ROLL() 

END STATE_VARIABLE 



MACRO Select_ROLL() : 

TABLE 

Is_No_Nonbasic_Lateral_Mode_Active() : T; 

Modes = On : T; 

END TABLE 
END MACRO 

MACRO Deselect_ROLL() : 

TABLE 

When_Nonbasic_Lateral_Mode_Activated() : T *; 

When (Modes = Off) : * T; 

END TABLE 
END MACRO 



Fig. 4. A small portion of the FGS specification in RSML ® 



6.1 State Coverage 

Definition 1. A test suite is said to achieve state coverage of a state variable 
in an RSML~^ specification, if for each possible value of the state variable there 
is at least one test case in the test suite that assigns that value to the given 
variable. The test suite achieves state coverage of the specification if it achieves 
state coverage for each state variable. 

Consider, for example, the state variable ROLL in the FGS specification ex- 
ample: 

STATE_VARIABLE ROLL : { Cleared, Selected, UNDEFINED }; 

A test suite would achieve state coverage on ROLL, if for each of its three 
different possible values, there is a test case in which ROLL takes that value. Note 
that a single test case might actually achieve this coverage by assigning different 
values to ROLL at different points in the sequence. To provide a comprehensive 
test suite, however, in this case study we generate one test case for each state 
variable value. One could use the following LTL formulas to generate the test 



cases: 
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1. G '(ROLL = Cleared) 

2. G '(ROLL = Selected) 

3. G '(ROLL = UNDEFINED) 

In each case, the property asserts that ROLL can never have a specific value and 

the counter-example produced is a sequence of values for the system variables 
starting from an initial state and ending in a state where ROLL has the specific 
value. 

6.2 Decision Coverage (Table Coverage) 

Definition 2. A test suite is said to achieve decision coverage of a given 
RSML~^ specification, if each guard condition (specified as either an and/or 
table or as a standard Boolean expression) evaluates to true at some point in 
some test case and evaluates to false at some point in some other test case in 
the test suite. 

We also refer to this coverage criterion as table coverage since AND/OR 
tables typically are used for every decision in RSML“®. As an example, consider 
the transition defined for the ROLL state variable in Figure 4. 

If we consider the transitions to Selected guarded by the condition encapsu- 
lated in the Select_ROLL() , test cases to provide decision coverage of this decision 
can be generated using the following two trap properties. 

1. G((Select_R0LLO) -> '(ROLL = Selected)) 

2. G('(Select_RDLL()) -> (ROLL = Selected)) 

6.3 Clause-Wise Transition Coverage 

Finally, to exercise the approach with a complex test coverage criterion, we look 
at the code based coverage criterion called modified condition/decision cover- 
age (MC/DC) and define a similar criterion. MC/DC was developed to meet 
the need for extensive testing of complex boolean expressions in safety-critical 
applications [4]. Ideally, one should test every possible combination of values 
for the conditions, thus achieving compound condition coverage. Nevertheless, 
the number of test cases required to achieve this grows exponentially with the 
number of conditions and hence becomes huge or impractical for systems with 
tens of conditions per decision point. MC/DC was developed as a practical and 
reasonable compromise between decision coverage and compound condition cov- 
erage. It has been in use for several years in the commercial avionics industry. 
A test suite is said to satisfy MC/DC if executing the test cases in the test suite 
will guarantee that: 

— every point of entry and exit in the program has been invoked at least once, 

— every basic condition in a decision in the program has taken on all possible 
outcomes at least once, and 

— each basic condition has been shown to independently affect the decision’s 
outcome 
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where a basic condition is an atomic Boolean valued expression that cannot be 
broken into Boolean sub-expressions. A basic condition is shown to indepen- 
dently affect a decision’s outcome by varying only that condition while holding 
all other conditions at that decision point fixed. Thus, a pair of test cases must 
exist for each basic condition in the test-suite to satisfy MC/DC. However, test 
case pairs for different basic conditions need not necessarily be disjoint. In fact, 
the size of MC/DC adequate test-suite can be as small as TV -I- 1 for a decision 
point with N conditions. 

If we think of the system as a realization of the specified transition relation, 
it evaluates each guard on each transition to determine which transitions are 
enabled and thus each guard becomes a decision point. The predicates in turn 
are constructed from clauses - the basic conditions. 

Definition 3. A test suite is said to achieve clause-wise transition coverage 
(CTC) for a given transition of a variable in an RSML~^ specification, if every 
basic Boolean condition in the transition guard is shown to independently affect 
the transition. 

Consider the following transition example adopted from an avionics system 
related to the FGS: 



EQUALS PowerOn IF 
TABLE 

PREV_STEP(DOI) IN_STATE AttemptingOn : F T 

PREV_STEP(DOI) IN_DNE_0F {PowerOff, Unknown}: T F 

DOIStatus = On : T T 

AltitudeStatus IN_STATE Below : T * 

ivReset : F F 

END TABLE 



To show that each of the basic conditions in the rows independently affects 
the transition, one should produce a set of test cases in which for any given basic 
condition there are two test cases, such that one makes the basic condition true 
and the other makes it false, the rest of the basic conditions have the same truth 
values in both test cases, and in one test case the transition is taken while in 
the other it is not. For the purposes of this example, let us just consider the first 
column. We may generate the trap properties by examining the truth value for 
each row in the first column as follows: 
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where Ri stands for the basic condition in the row of the table and POST 
represents the post-state condition DOI = PowerOn. 
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MC/DC test cases come in pairs, one where the atomic condition evaluates 
to false and one where it evaluates to true, but no other atomic conditions in 
the Boolean expression are changed. In the example above, trap properties 0 
and 1 provide coverage of Rl. Unfortunately, the model checking approach to 
test case generation is incapable of capturing such constraints over two test 
sequences. To work around this problem, we have developed a novel alternative 
that leverages a model checker for complete and accurate MC/DC test case 
generation. We automatically rewrite the system model by introducing a small 
number of auxiliary variables to capture the constraints that span more than 
one test-sequence. We also introduce a special system reset transition to restore 
a system to its initial state. With these small modifications, a test constraint 
spanning two sequences in the original model can be expressed as a constraint 
on a single test-sequence in the modified model. Model-checking techniques can 
then be employed to generate this single test sequence which can be later factored 
into two separate test-sequences for the original model satisfying the actual test 
criteria. This process has been fully automated, and used to generate the MC/DC 
like tests in this case study. 

To summarize, we have automated the generation of trap properties for a col- 
lection of structural coverage criteria of formal specifications. In this case study 
we are using the three representative criteria described above; state coverage, 
decision coverage , and clause-wise condition coverage. The experiential results 
of using model checkers to generate test suites to these coverage criteria are 
presented next. 

7 Experimental Results and Discussion 

The results of our case study are presented in Tables 2 through 4. Table 2 pro- 
vides a count of the number of trap properties generated for each FGS model for 
each coverage criterion. Note that this number reflects the naive generation of 
trap properties - we simply generate one trap property for each structural ele- 
ment we aim to cover. Naturally, the desired coverage can typically be achieved 
with substantially fewer test cases - see discussion later in this section. Recall, 
however, that the aim of this case study was not to provide a minimal set of test 
cases providing the desired coverage, but instead to evaluate the scalability of 
using model checking techniques for test case generation - thus, we wanted to 
work with many properties to make our results representative of expected per- 
formance. Finally, note that each trap property for MC/DC coverage describes 
an MC/DC pair of test cases - we will have twice as many MC/DC test cases 
as we have trap properties. 

Table 3 gives the performance figures in terms of time and memory of gen- 
erating the suites to the three coverage criteria (a - in the table indicates that 
a run of the model checker was terminated after an excessively long run - more 
than 24 hours). 

As the data in Table 3 illustrates, symbolic model checking does not seem 
to scale well beyond FGS03. For models FGS04 and FGS05, it quickly runs into 
problems. From Table 3 it is clear that memory usage is not the problem. To keep 
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Table 2. Trap Properties generated per test criterion for all FGS Models 
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FGSOl 


FGS02 
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State Coverage 
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246 


Table Coverage 
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68 
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342 


MCDC 

Coverage 


32 
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323 



Table 3. Execution Times and Memory Usage for all FGS Models 
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1(10) 


2.04 


10 


0.79 


10(10) 


1.72 


10 


3.55 


13(5) 


FGSOl 


4.58 


11 


0.61 


11 (10) 


7.3 


15 


2.14 


14(10) 


6.72 


15 


11.4 


19(5) 


FGS02 


33.86 


21 


1.76 


15 (10) 


82.34 


24 


5.94 


21 (10) 


78.2 


24 


51.76 


30 (5) 


FGS03 


251.37 


27 


3.74 


19(10) 


469.91 


33 


11.99 


29 (10) 


520.95 


32 


137.34 


49 (5) 


FGS04 




- 


57.89 


32(12) 


- 


- 


101.2 


53 (12) 


- 


- 


39167.8 


110(5) 


FGS05 




- 


81.61 


58 (12) 


- 


- 


193.67 


80(12) 


- 


- 


46196.91 


165 (5) 



memory usage small, we are using the dynamic BDD variable reordering feature 
of NuSMV - without this option, NuSMV would exhaust the available memory 
quickly. Nevertheless, the dynamic variable reordering is quite costly and this 
deteriorates the performance of NuSMV to a point where the time to reorder 
becomes unbearable. In addition, the cost of constructing counterexamples in a 
symbolic model checker becomes a serious issue when the model checker is used 
for test case generation since we need a large number of counterexamples. 

The bounded model checker, on the other hand, scales well to all FGS Models. 
To determine the search depth for the bounded model checker, we used results 
from a previous study using symbolic model checking for verification of the FGS 
system models [5]. In this previous study, we found that the full state space of 
FGSOO through FGS03 could be explored with 5 steps and with 12 steps in FGS04 
and FGS05. Therefore, when generating state and table coverage, we simply used 
the default setting of 10 steps for FGSOO through FGS03 and extended it to 12 
for FGS04 and 05. We attempted the same settings when generating MG/DG 
coverage, but the time required to search to this depth was simply unacceptable. 
Note here that the majority of the time was spent searching for test cases that 
are infeasible - a certain MG/DG pair did not exist. Searching to depth 12 for 
such non-existent test cases is counterproductive. Instead, we observed that the 
average test case length is quite short (Table 4 shows just a little over 1 for table 
coverage) and we simply set the search depth to a prudent 5. We expected this 
to assure that we found a large number of test cases, but did not waste any time 
searching for the ones that did not exist. Naturally, we may still miss some test 
cases that are longer than 5 should they exist (see discussion below). 

As can be seen from the performance data for the bounded model checker in 
Table 3, even with a reduced search depth, the performance deteriorated quite 
notably when generating tests for MG/DG coverage (orders of magnitude slower 
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than for the other coverage criteria). Two factors contribute to this phenomenon; 
(1) the length of the test sequences generated and (2) the complexity of the LTL 
properties to check. 

Table 4 shows the average test case length we measured during our experi- 
ments. From the results it is clear that the test cases for MC/DC coverage were 
approximately three times as long as the ones for the other coverage criteria. 
Recall the short discussion on MC/DC in Section 6. The counterexample gener- 
ated for an MC/DC trap property describes not one test case, but an MC/DC 
test case pair - the first test case takes a transition t out of state X with a 
particular truth assignment to the basic conditions, the second takes us back to 
state X but this time we have exactly one basic condition with a different truth 
assignment and we do not take transition t. Thus, the test case length is destined 
to be approximately twice the length of the test cases generated for the other 
criteria. The need for a deeper search dramatically decreases the performance of 
the bounded model checker. 



Table 4. The average length of the test cases generated 
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3.1 


FGS04 


- 


1.0 
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1.2 
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2.9 
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- 


1.0 


- 


1.1 


- 
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In addition to the increased test case length, the LTL properties characteriz- 
ing the test cases are significantly more complex for MC/DC coverage than for 
the other coverage criteria (again, see Section 6). The dramatically longer trap 
properties negatively affects the performance of the bounded model checker [24] . 

From this discussion we can conclude that a bounded model checker seems 
to be a suitable tool for test case generation from formal specifications; it scales 
well to systems of industrial relevance, it generates the shortest possible test 
cases, and it is fully automated. There are, however, some drawbacks. Most 
importantly, if the shortest test case needed to cover a specific feature in the 
model is longer than the search depth of the bounded model checker, we have 
no way of telling if the test case simply does not exist or if it is longer than the 
search depth. This is an issue particularly for MC/DC generation where there 
are a fair number of MC/DC pairs that simply do not exist - if the bounded 
model checker fails to find a test case, the determination if it indeed exists is 
now a manual process. 

As mentioned previously, during the generation of the test suites we did not 
attempt to minimize the number of test cases to achieve a desired coverage. In 
fact, it is easy to see that our test case generation approach, where a test case 
is generated for each trap property, will lead to a large amount of duplicate 
coverage. 
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We performed a simple analysis to measure the level of duplication for the 
test suite generated to achieve table coverage - we simply executed the tests 
(sequentially) and kept track of the coverage achieved after each test. Most test 
cases did not increase the coverage of the test suite - a clear indication that tests 
are redundant with respect to achieving coverage. Some tests, on the other hand, 
produced a “jump” in the coverage indicating that they exercise a new portion 
of the software. We created a reduced test suite using the tests that caused this 
jump in the coverage. Although this is not necessarily the minimal set to achieve 
the desired coverage, we found the difference in size between the initial set and 
this set to be such that it clearly indicated a large degree of duplication over the 
generated tests. For example we found that for FGSOO and FGS03 respectively, 
only 3 out of the 27, and, 11 out of 95 test cases were required to achieve table 
coverage. Since the cost of resetting a system and executing a new test case in 
some applications is high, identifying a small test suite that provides adequate 
coverage is of some importance. In future work we will investigate an iterative 
approach to the test case generation in order to achieve smaller test suites. Of 
course, a smaller test suite might achieve the same coverage, but it may reduce 
the defect detection capability of the test suite. We have just initiated a study 
to investigate how test suite size impacts the defect detection capability of the 
suite. 

As mentioned in Section 2, we intended our test case generation framework 
to allow an analyst to generate test sequences from a formal specification and 
then run the tests on the implementation. Should additional tests be needed, 
we would like to generate the additional input sequences from the code. To this 
effect we evaluated an explicit state code model checker. 

8 Java PathFinder Results 

We have done some preliminary experiments on using the Java PathFinder (JPF) 
code-level model checker [25] to do test case generation on Java programs auto- 
matically generated from the RSML“® models. Our initial motivation for using 
a code-level model checker was to investigate whether one can use such a tool 
to discover test cases for covering code that was not being covered by the test 
cases derived from the RSML“® specification. Here however, since we are doing 
an automatic translation of RSML“® to Java, we will use our preliminary re- 
sults to judge how an explicit-state model checker (such as JPF) compares to 
the symbolic and bounded model checker approaches for test case generation 
for RSML“®. We studied test cases for branch coverage at the Java level, since 
branch coverage is an often used code coverage criteria and, due to the transla- 
tion used, corresponds closely to table (decision) coverage at the RSML“® level. 

The test case generation process using JPF is currently not automated, in 
particular, the trap properties are assertions added by hand, and, there is no fa- 
cility to extract the test inputs from each counterexample produced. We therefore 
will not be reporting any specific timing and memory usage results, but rather 
make general observations. In short, the explicit-state model checker did not 
perform as well as the symbolic and bounded model checking approaches. For 
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FGSOO the model checker could generate test cases to cover all branches within 
a matter of seconds while using an insignificant amount of memory (less than 
1 Mb). Whereas for FGS03, it could not generate enough of the state-space to 
cover all the branches in a reasonable amount of time (3 hours). We did not 
attempt to generate tests for FGS04 or FGS05. 

From the experiments it is clear that the explicit-state model checking is 
particularly sensitive to the length of the test cases required to achieve the 
desired coverage. For example, FGS03 has 10 boolean inputs at each input cycle 
(i.e., options), and the explicit-state model checker can at most deal with 3 
such inputs. 

Explicit-state model checking does however allow more control over the 
search, and we conjecture this can be exploited to do efficient test case genera- 
tion. Specifically, one can use heuristic search [11] techniques to find the desired 
test cases - we will pursue this line of research in future work. Recently, the 
idea of combining symbolic execution with model checking to do test case gen- 
eration has been proposed [17] - this allows one to mitigate the effect of longer 
test cases and should therefore allow for more efficient test case generation. This 
latter approach is in some ways similar to doing bounded model checking, and 
we will investigate how these techniques compare in future work. 

9 Summary and Conclusions 

To summarize, we have conducted a series of case studies evaluating how well 
model checking techniques will scale when used for test case generation. Our 
experiences point out limitations of symbolic as well as explicit state model 
checkers. A bounded model checker, however, performed very well and shows 
tremendous promise. The domain of interest in our study has been safety crit- 
ical reactive systems - systems that lend themselves to modeling with various 
formalisms based on finite state machines. In this domain, test cases providing 
common coverage seem to be quite short, thus making bounded model checkers 
perform very well. 

Naturally, there are still many challenges to address. There are systems where 
the cost of restarting the system to execute a new test sequence is quite high. In 
this situation it is highly desirable to have long test cases that provides exten- 
sive coverage so that we can minimize the number of system restarts required 
to execute the test suite. The bounded model checking approach discussed per- 
forming well in our case study provides the exact opposite - we will get many 
very short test cases. Techniques to effectively merge these test cases to longer 
test sequences would be highly desirable. Alternatively, techniques based on ex- 
plicit state model checking and heuristic searches may be able to provide long 
test cases that provides extensive coverage of a model. We plan to investigate 
this approach in the context of Java PathFinder shortly. 

The nature of bounded model checking makes it unsuitable for verification. 
Determining the appropriate search depth to guarantee that we find most (if 
not all) test cases without wasting time with deep searches for test cases do not 
exist remains a challenge. 
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Our case study example is atypical in that it only contains discrete vari- 
ables, we have no integer or real variables in the model. Naturally, many models 
will have numerous numeric variables involved in various interrelated numeric 
constraints. Applying the model checking techniques on these systems will be a 
challenge. Recent advances bringing efficient decision procedures and bounded 
model checking together promises to help to some extend. Various abstraction 
techniques, for example, iterative refinement [14,8] and domain reduction ab- 
straction [6], also holds promise in this regard. We hope to conduct experiments 
on systems with these characteristics shortly. 
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Abstract. Generating effective tests and inferring likely program specifications 
are both difficult and costly problems. We propose an approach in which we 
can mutually enhance the tests and specifications that are generated by itera- 
tively applying each in a feedback loop. In particular, we infer likely specifica- 
tions from the executions of existing tests and use these specifications to guide 
automatic test generation. Then the existing tests, as well as the new tests, are 
used to infer new specifications in the subsequent iteration. The iterative proc- 
ess continues until there is no new test that violates specifications inferred in 
the previous iteration. Inferred specifications can guide test generation to focus 
on particular program behavior, reducing the scope of analysis; and newly gen- 
erated tests can improve the inferred specifications. During each iteration, the 
generated tests that violate inferred specifications are collected to be inspected. 
These violating tests are likely to have a high probability of exposing faults or 
exercising new program behavior. Our hypothesis is that such a feedback loop 
can mutually enhance test generation and specification inference. 



1 Introduction 

There are a variety of software quality assurance (SQA) methods being adopted in 
practice. Since there are particular dependences or correlations among some SQA 
methods, these methods could be integrated synergistically to provide value consid- 
erably beyond what the separate methods can provide alone [28, 32, 35, 11]. Two 
such exemplary methods are specification-based test generation and dynamic specifi- 
cation inference. Specification-based test generation requires specifications a priori 
[13, 25, 5]. In practice, however, formal specifications are often not written for pro- 
grams. On the other hand, dynamic specification inference relies on good tests to 
infer high quality specifications [10, 31, 21]. There is a circular dependency between 
tests in specification-based test generation and specifications in dynamic specification 
inference. 

In addition, when formal specifications are not available, automatic test genera- 
tion, such as white-box test generation or random test generation, does not suffi- 
ciently address output checking. Without specifications, output checking is limited to 
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detecting a program crash, or an exception is thrown but not caught. In other words, 
there is a lack of test oracles in automatic test generation without specifications a 
priori. 

In this research, without a priori specifications, we want to mutually enhance test 
generation and specification inference. At the same time, from a large number of 
generated tests, we want to have a way to identify valuable tests for inspection. Valu- 
able tests can be fault-revealing tests or tests that exercise new program behavior. The 
solution we propose is a method and tools for constructing a feedback loop between 
test generation and specification inference, using and adapting existing specification- 
based test generation and dynamic specification inference techniques. We implement 
the method for three types of inferred specifications: axiomatic specifications [10], 
protocol specifications [31], and algebraic specifications [21]. We demonstrate the 
usefulness of the method by initially focusing on the unit test generation and the 
specification inference for object-oriented components, such as Java classes. 



2 Background 

2.1 Formal Specifications 

A formal specification expresses the desired behavior of a program. We model the 
specification in a style of requires/ensures. Requires describe the constraints of using 
APIs provided by a class. When requires are satisfied during execution, ensures de- 
scribe the desired behavior of the class. Requires can be used to guard against illegal 
inputs, and ensures can be used as test oracles for correctness checking. 

Axiomatic specifications [22] are defined in the granularity of a method in a class 
interface. Preconditions for a method are requires for the method, whereas post- 
conditions for a method are ensures for the method. Object invariants in axiomatic 
specifications can be viewed as the pre/post-conditions for each method in the class 
interface. The basic elements in requires/ensures consist of method arguments, re- 
turns, and class fields. 

Protocol specifications [6] are defined in the granularity of a class. Requires are 
the sequencing constraints in the form of finite state machines. Although extensions 
to protocol specifications can describe ensures behavior, there are no ensures in basic 
protocol specifications. The basic elements in requires consist of method calls, in- 
cluding method signatures, but usually no method arguments or returns. 

Algebraic specifications [18] are also defined in the granularity of a class. Ensures 
are the AND combination of all axioms in algebraic specifications. In the AND com- 
bination, each axiom, in the form of LHS=RHS, is interpreted as “if a current call 
sequence window instantiates LHS, then its result is equal to RHS”. The basic ele- 
ments in ensures consist of method calls, including method signature, method argu- 
ments and returns, but no class fields. Therefore, algebraic specifications are in a 
higher-level abstraction than axiomatic specifications are. Usually there are no ex- 
plicit requires in algebraic specifications. Indeed, sequencing constraints, which are 
requires, can be derived from the axiom whose RHS is an error or exception [4]. 
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2.2 Dynamic Specification Inference 

Dynamic specification inference discovers operational abstractions from the execu- 
tions of tests [20]. An operational abstraction is syntactically identical to a formal 
specification. The discovered operational abstractions consist of those properties that 
hold for all the observed executions. These abstractions can be used to approximate 
specifications or indicate the deficiency of tests. 

Ernst et al. [10] develop a dynamic invariant detection tool, called Daikon, to infer 
likely axiomatic specifications from executions of test suites. It examines the variable 
values that a program computes, generalizes over them, and reports the generaliza- 
tions in the form of pre/post-conditions and class invariants. 

Whaley et al. [31] develop a tool to infer likely protocol specifications from 
method call traces collected while a Java class interface is being used. These specifi- 
cations are in the form of multiple finite state machines, each of which contains 
methods accessing the same class field. Ammons et al. [1] develop a tool to infer 
likely protocol specifications from C method call traces by using an off-the-shelf 
probabilistic finite state automaton learner. Hagerer et al. [19] present the regular 
extrapolation technique to discover protocol specifications from execution traces of 
reactive systems. 

Henkel and Diwan [21] develop a tool to derive a large number of terms for a Java 
class and generate tests to evaluate them. The observational equivalence technique [3, 
9] is used to evaluate the equality among these terms. Based on the evaluation results, 
equations among these terms are proposed, and are further generalized to infer axi- 
oms in algebraic specifications. 



2.3 Specification-Based Test Generation 

We categorize specification-based test generation into test generation for functional- 
ity and test generation for robustness. Test generation for functionality generates tests 
that satisfy requires, and checks whether ensures are satisfied during test executions. 
Test generation for robustness generates tests that may not satisfy requires, and 
checks whether a program can handle these test executions gracefully, such as throw- 
ing appropriate exceptions. 

We divide the test generation problem into three sub-problems: object state setup, 
method parameter generation, and method sequence generation. Object state setup 
puts the class under test into particular states before invoking methods on it. Method 
parameter generation produces particular arguments for methods to be invoked. 
Method sequence generation creates particular method call sequences to exercise the 
class on certain object states. Axiomatic specifications provide more guidance on 
both method parameter generation and object state setup, whereas algebraic specifica- 
tions provide more guidance on method sequence generation. Protocol specifications 
provide more guidance on both object state setup and method sequence generation. 

Dick and Faivre develop a tool to reduce axiomatic specifications to a disjunctive 
normal form and generate tests based on them [8]. Boyapati et al. develop a tool to 
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generate tests effectively by filtering the test input space based on preconditions in 
axiomatic specifications [5]. Gupta develop a structural testing technique to generate 
test inputs to exercise some particular post-conditions or assertions [16]. A commer- 
cial Java unit testing tool, ParaSoft Jtest [24], can automatically generate test inputs to 
perform white box testing when no axiomatic specifications are provided, and per- 
form black box testing when axiomatic specifications are equipped. 

There is a variety of test generation techniques based on protocol specifications, 
which is in the form of finite state machines [25]. There are several test generation 
tools based on algebraic specifications. They generate tests to execute the LHS and 
RHS of an axiom. The DAISTS tool developed by Gannon et al. [13] and the Daistish 
tool developed by Hughes and Stotts [23] use an implementation- supplied equality 
method to compare the results of LHS and RHS. A tool developed by Bernot et al. [3] 
and the ASTOOT tool developed by Doong and Frankl [9] uses observational equiva- 
lence to determine whether LHS and RHS are equal. 



3 Feedback Loop Framework 

Our approach can be viewed as a black box into which a developer feeds a program 
and its existing tests, and from which the developer gets a set of valuable tests, in- 
ferred specifications, and reasons why these tests are valuable. Then the developer 
can inspect the valuable tests and inferred specifications for problems. Our feedback 
loop framework consists of multiple iterations. Each iteration is given the program, a 
set of tests, and specifications inferred from the previous iteration (except for the first 
iteration). After each iteration, a complete set of new tests, a valuable subset of new 
tests, reasons for being valuable, and new inferred specifications are produced. The 
subsequent iteration is given the original tests augmented by the complete set of new 
tests or the valuable subset of new tests, as well as the new inferred specifications. 
Optionally the developer can specify some iteration- terminating conditions, such as a 
stack size being equal to the maximum capacity, or the number of iterations reaching 
the specified number. The iterations continue until user-specified conditions are satis- 
fied and there is no new test that violates specifications inferred in the previous itera- 
tion. 

Figure 1 shows an overview of the feedback loop framework. The framework de- 
fines four stages for each iteration: trace collection, specification inference, test gen- 
eration, and test selection. Human intervention is only needed for inspecting selected 
tests and inferred specifications in the end of the feedback loop. But human interven- 
tion may be incorporated in the end of each iteration and should improve results. 

In the trace collection stage, the given tests are run on the instrumented Java pro- 
gram and traces are collected from the executions. Object states are defined by some 
particular relevant class field values. The values of method arguments, returns, and 
object states are recorded at the entry and exit of a method execution. To collect ob- 
ject states, we instrument invocations of this.equals(this) at the entry and exit of each 
public method in the Java class file. Then we monitor the class field values accessed 
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by the execution of this.equals(this). These values are collected as object states. The 
collected method arguments, returns, and object states are used in the specification 
inference stage and the test generation stage. 




Fig. 1. An overview of the feedback loop framework 

In the specification inference stage, the collected traces are used to infer specifica- 
tions. The axiomatic and protocol specification inference techniques in Section 2.2 
are used in this stage. Instead of using the algebraic specification inference technique 
based on observational equivalence [21], we develop a tool prototype to infer alge- 
braic specifications based on an implementation-supplied equality method. Since it is 
expensive to execute the equality method to compare object states among all method 
executions, we use object states collected in the trace collection stage to compare the 
object states offline. Based on a set of pre-defined axiom-pattern templates, the tool 
looks for equality patterns among collected object states, method arguments, and 
returns of methods. We infer algebraic specifications by using these equality patterns 
as axioms. 

In the test generation stage, inferred specifications are used to guide test genera- 
tion. Jtest [24] is used to automatically generate tests based on axiomatic specifica- 
tions. In protocol and algebraic specification-based test generation, we grow new 
object states and method parameters based on the collected traces in the present itera- 
tion. In addition, we generate the method sequences based on inferred protocol and 
algebraic specifications. 

Because inferred preconditions in axiomatic specifications may be overcon- 
strained, only generating test inputs that satisfy them would leave some interesting 
legal test inputs out of scope. One solution is to remove all the inferred preconditions 
before the specifications are used to guide test generation. Then both legal and illegal 
test inputs can be generated. Allowing some illegal inputs can still be useful in testing 
program robustness. However, removing inferred preconditions makes test generation 
based on preconditions unguided. In future work, we plan to investigate techniques to 
remove or relax parts of inferred preconditions. There are similar overconstrained 
problems with protocol specifications. To address these problems, we can deliber- 
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ately generate some method sequences that do not follow the transitions in the in- 
ferred finite state machines. For example, we can generate test sequences to exercise 
the complement of the inferred finite state machines. The test generation based on 
inferred algebraic specifications also needs some adaptations. If not all the combina- 
tions of method pairs are exercised, we need to generate tests to exercise those un- 
covered method pairs besides those method pairs in the inferred axioms. 

In the test selection stage, the generated tests are executed, and checked against the 
inferred specifications. Two types of tests are selected for inspection. The first type of 
test is the test whose execution causes an uncaught runtime exception or a program 
crash. If the test is a legal input, it may expose a program fault. The second type of 
test is the test whose execution violates ensures in axiomatic specifications or alge- 
braic specifications. If the ensures violated by the test are overconstrained ones, this 
may indicate the insufficiency of the existing tests. If the violated ensures are actual 
ones and the test is a legal input, it may expose a program fault. These selected tests 
are collected as the candidates of valuable tests. In the end of the feedback loop, the 
developer can inspect these selected tests and their violated specifications for prob- 
lems. If a selected test input is an illegal one, the developer can either add precondi- 
tions to guard against this test input in the subsequent iteration, or adopt defensive 
programming to throw appropriate exceptions for this test input. If a selected test 
input is a legal one and it exposes a program fault, the developer can fix the bug that 
causes the fault, and augment the regression test suite with this test input after adding 
an oracle for it. If a selected test input is a legal one and it does not expose a fault but 
exercise certain new program behavior, the developer can add it to the regression test 
suite together with its oracle. Besides selecting these two types of tests, the developer 
can also select those tests that exercise at least one new structural entity, such as 
statement or branch. 

In our experiments with the feedback loop for axiomatic specifications, the num- 
ber of selected tests is not large, which makes the human inspection effort affordable 
[34]. In addition, the selected tests have a high probability of exposing faults or exer- 
cising new program behavior. We observed the similar phenomena in our preliminary 
experiment with the feedback loop for algebraic specifications. 

The selected tests, or all the newly generated tests from the present iteration are 
used to augment the existing tests in the subsequent iteration. The inferred specifica- 
tions from the present iteration are also used in the specification inference stage of the 
subsequent iteration. In this specification inference stage, conditional specifications 
might be inferred to refine some of those specifications that are violated by the gener- 
ated tests in the present iteration. 



4 Related Work 

There have been several lines of work that use feedback loops in static analyses. Ball 
and Rajamani construct a feedback loop between program abstraction and model 
checking to validate user-specified temporal safety properties of interfaces [2]. 
Flanagan and Leino use a feedback loop between annotation guessing and theorem 




Mutually Enhancing Test Generation and Specification Inference 66 



proving to infer specifications statically [12]. Wild guesses of annotations are auto- 
matically generated based on heuristics before the first iteration. Human interventions 
are needed to insert manual annotations in subsequent iterations. Giannakopoulou et 
al. construct a feedback loop between assumption generation and model checking to 
infer assumptions for a user-specified property in compositional verification [14, 7]. 
Given crude program abstractions or properties, these feedback loops in static analy- 
ses use model checkers or theorem provers to find counterexamples or refutations. 
Then these counterexamples or refutations are used to refine the abstractions or prop- 
erties iteratively. Our work is to construct a feedback loop in dynamic analyses, cor- 
responding to the ones in static analyses. Our work does not require users to specify 
properties, which are inferred from test executions instead. 

Naumovich and Frankl propose to construct a feedback loop between finite state 
verification and testing to dynamically confirm the statically detected faults [26]. 
When a finite state verifier detects a property violation, a testing tool uses the viola- 
tion to guide test data selection, execution, and checking. The tool hopes to find test 
data that shows the violation to be real. Based on the test information, human inter- 
vention is used to refine the model and restart the verifier. This is an example of a 
feedback loop between static analysis and dynamic analysis. Another example of a 
feedback loop between static analysis and dynamic analysis is profile-guided optimi- 
zation [30]. Our work focuses on the feedback loop in dynamic analyses. 

Peled et al. present the black box checking [29] and the adaptive model checking 
approach [15]. Black box checking tests whether an implementation with unknown 
structure or model satisfies certain given properties. Adaptive model checking per- 
forms model checking in the presence of an inaccurate model. In these approaches, a 
feedback loop is constructed between model learning and model checking, which is 
similar to the preceding feedback loops in static analyses. Model checking is per- 
formed on the learned model against some given properties. When a counterexample 
is found for a given property, the counterexample is compared with the actual system. 
If the counterexample is confirmed, a fault is reported. If the counterexample is re- 
futed, it is fed to the model learning algorithm to improve the learned model. Another 
feedback loop is constructed between model learning and conformance testing. If no 
counterexample is found for the given property, conformance testing is conducted to 
test whether the learned model and the system conform. If they do not conform, the 
discrepancy-exposing test sequence is fed to the model learning algorithm, in order to 
improve the approximate model. Then the improved model is used to perform model 
checking in the subsequent iteration. The dynamic specification inference in our feed- 
back loop is corresponding to the model learning in their feedback loop, and the 
specification-based test generation in our feedback loop is corresponding to the con- 
formance testing in their feedback loop. Our feedback loop does not require some 
given properties, but their feedback loop requires user-specified properties in order to 
perform model checking. 

Gupta et al. use a feedback loop between test data generation and branch predicate 
constraint solving to generate test data for a given path [17]. An arbitrarily chosen 
input from a given domain is executed to exercise the program statements relevant to 
the evaluation of each branch predicate on the given path. Then a set of linear con- 
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straints is derived. These constraints can be solved to produce the increments for the 
input. These increments are added to the current input in the subsequent iteration. The 
specification inference in our work is corresponding to the branch predicate con- 
straints in their approach. Our work does not require users to specify a property, 
whereas the work of Gupta et al. requires users to specify the path to be covered. 



5 Conclusion 

We have proposed a feedback loop between specification-based test generation and 
dynamic specification inference. This feedback loop can mutually enhance both test 
generation and specification inference. The feedback loop provides aids in test gen- 
eration by improving the underlying specifications, and aids in specification inference 
by improving the underlying test suites. We have implemented a feedback loop for 
axiomatic specifications, and demonstrated its usefulness [33, 34]. We have devel- 
oped an initial implementation of feedback loop for algebraic specifications, and plan 
to do more experiments and refine the implementation. In future work, we plan to 
implement and experiment the feedback loop for protocol specifications. At the same 
time, the following research questions are to be further investigated. In the first itera- 
tion, the inferred specifications can be used to generate a relatively large number of 
new tests. In the subsequent iterations, the marginal improvements on tests and speci- 
fications come from the specification refinement and object state growth. We need to 
explore effective ways to maximize these marginal improvements. We also plan to 
investigate other SQA methods, such as static verification techniques, in evaluating 
the quality of the inferred specifications in iterations [27]. 
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Abstract. Writing specifications using Java Modeling Language has been ac- 
cepted for a long time as a practical approach to increasing the correctness and 
quality of Java programs. However, the current JML testing system (the JML 
and JUnit framework) can only generate skeletons of test fixture and test case 
class. Writing codes for generating test cases, especially those with a compli- 
cated data structure is still a labor-intensive job in the test for programs anno- 
tated with JML specifications. 

This paper presents JMLAutoTest, a novel framework for automated testing of 
Java programs annotated with JML specifications. Eirstly, given a method, 
three test classes (a skeleton of test client class, a JUnit test class and a test case 
class) can be generated. Secondly, JMLAutoTest can generate all nonisomor- 
phic test cases that satisfy the requirements defined in the test client class. 
Thirdly, JMLAutoTest can avoid most meaningless cases by running the test in 
a double-phase way which saves much time of exploring meaningless cases in 
the test. This method can be adopted in the testing not only for Java programs, 
but also for programs written with other languages. Einally, JMLAutoTest exe- 
cutes the method and uses JML runtime assertion checker to decide whether its 
post-condition is violated. That is whether the method works correctly. 



1 Introduction 

Writing specifications of Java Modeling Language has been viewed as an effective 
and practical way of increasing the correctness and quality of Java programs for 
JML’ great expressiveness and easy grammar which is similar to Java’s. In addition, 
JML allows assertions to be intermixed with Java code [3, 6], which brings conven- 
ience to Java programmer. In the past few years, many tools have been implemented 
to support JML, including the compiler, runtime assertion checker [1] as well as its 
testing framework [2]. The current JML testing system (JML and JUnit testing 
framework) can generate the test fixture and the skeleton of the test case class auto- 
matically which sets programmers free from writing unit test codes, thus making unit 
test more accessible to programmers. 
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1.1 The Problem 

However, programmers still need to spend a lot of time writing codes for generating 
test cases, especially those which represent complicated data structures (i.e. hinary- 
tree, linkedlist). There have heen some testing tools which can automatically generate 
test cases, such as Korat [4] and Alloy Analyzer (AA)[22], hut they either supply the 
whole test case space generated to the test, never caring about how many test cases 
are meaningless^ ones, or require programmers to write special predicates to get rid of 
meaningless test cases. For example, in Korat[4] programmers should write an addi- 
tional method “public boolean repOk” in the input class to keep the test cases gener- 
ated meaningful. In our opinions, at first, only identifying meaningless cases when 
test is run is not enough because a test with many meaningless inputs can tell little 
about the execution of tested method although maybe this test is based on a very large 
test case space and it might spend a lot of time dealing with meaningless ones. So 
avoiding meaningless test case before running the test is very important. In the sec- 
ond place, in many cases, the tester is not just the one who develops the class to be 
tested or it is a black box test, therefore, handling test cases totally depending on 
predicates provided by programmers is not a practical way. 
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Fig. 1. The work flow of JMLAuto Test 



1.2 How JMLAutoTest Deals with These Problems 

In this paper, we present JMLAutoTest, an automatic testing framework, which can 
solve these problems well. Given a method annotated with JML specification, similar 
to the JML and JUnit testing framework [2], JMLAutoTest firstly generates a JUnit 
test class (*_JML_Test) which sets up test fixture and a test case provider class 



* Meaningless test case here means the one which violates the pre-condition of method to be 
tested. 
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(*_JML_TestCase) which is a sub class of the test class. In addition, JMLAutoTest 
generates the skeleton of another class called test client (JMLTestClient), into which 
programmers can easily set class domains for classes of arguments of method to be 
tested or field domains for the fields in those classes. Then when test is run, the test 
case provider can get test cases from test client automatically. 

JMLAutoTest can avoid most meaningless test cases by running the test in a dou- 
ble-phase way (runPreTest and runFinalTest in Figure 1). Double-phase way is a 
statistic based testing approach. Firstly, programmers should make a standard to di- 
vide the whole test case space into several partitions. This standard is somewhat like 
the “Operational Profile” in Cleanroom testing [5, 15], so here we also call it opera- 
tional profile. Programmers can make the operational profile in the method makeOp- 
erationalProfile (shown in figure 1) in the test client. After the test case provider gets 
the generated test cases, it passes the whole test case spaces to the method makeOp- 
erationalProfile. Then this method divides the test case space into several partitions 
according to the criteria made by programmers and returns these partitions. During 
the first phase, tests with each group of test cases chosen from these partitions are run 
respectively. Each group only contains a relatively small number (a few dozen) of test 
cases. 

Then based on the statistical principle we can know which partition of test case 
space produces the most meaningless test cases and which produces the second most 
... Thus, the probability of meaningless test cases contained in each partition can be 
determined after the first phase test (pre-test). During the second phase, a large num- 
ber of test cases should be taken out from each partition depending on proportion 
obtained after the first phase test. From this point of view, meaningless test cases can 
be avoided to a certain extent and programmer who runs the test only need to make 
the operational profile without knowing the details about the method to be tested. But 
the validity of this way is based on the quality of operational profile which is used to 
create different partitions. This way of testing can be applied to not only Java pro- 
grams, but also programs with other languages. 

The rest of this paper is organized as follows. Section 2 presents the algorithm and 
principle that JMLAutoTest uses to generate test case. Section 3 describes how dou- 
ble-phase testing works. Section 4 presents the test oracle generation for testing meth- 
ods. Section 5 illustrates some experimental results. Section 6 reviews related work 
and Section 7 concludes. 



2 Test Case Generation 

The whole procedure of test case generation in JMLAutoTest includes three parts: 
finitization, validity checking and nonisomorphism. Figure 2 gives an overview of the 
algorithm of JMLAutoTest test case generation for the type “jmlautotest.example”. 
JMLObjectSequence is a utility class defined in the package org.jmlspecs.model [3, 
6] which implements a sequence to contain objects. We use class Finitization (Section 
2.1) to finish the work of generation. After a test case candidate is generated, 
JMLAutoTest uses JML runtime assertion checker to check its validity (Section 2.2) 
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in order to kill it if Invalid. Then JMLAutoTest visits existing test cases to make sure 
that this candidate is not isomorphic to a certain test case existing in the test case 
space (Section 2.3). 

// The following method is defined in the test client 

public JMLObectSequence makeCase_jmlautotest_example () { 

JMLObectSequence cases = new JMLOb j ectSequence () ; 

Finitization f= new Finitization ( jmlautotest . example . class) ; 

//Create the value domainl which contains 5 instances of 
//the class "example . fieldl . class" for the field //"fieldl" 

JMLOb j ectSequence valuedomainl 
= f . createObj ects (example . fieldl . class , 10); 

// set value domain for the field "fieldl" 
f . set (example . fieldl , valuedomainl); 
f . generate ( ) ; 

cases = f . getResult ( ) ; 

return cases; 

} 

) 



Fig. 2 .The overview of the method makeCase_* in the test client 

2.1 Finitization 

JMLAutoTest provides a class Finitization for programmers to generate a finite test 
case space of any kinds. The whole process of the working of a finitization includes 
two parts: setting the value domain and generating. 

Set Value Domain for Fields of the Input Class. Programmers can bind a certain 
field with a set of bounds by setting the value domain for the field. Then 
JMLAutoTest will create a candidate object by assigning to each field all possible 
values from its corresponding domain. The field domain is represented by an object 
either of the type JMLObjectSequence which contains objects or of the type 
JMLPrimSequence which is another utility class to contain values of primitive types. 
All of these domains are inserted into a hash table for use in generating. 

Generate Test Case Space. There are two kinds of method generate in class 
Finitization. One is to generate test case space for common classes and another is to 
generate test cases for special classes which implement a certain linked data structure 
such as binarytree and linkedlist. 

Figure 3 illustrates how to generate test case for the class which implements a linked 
data structure (linked list). The invariant clause at the beginning will be presented in 
Section 2.2. The method generate here is with two arguments. The first one is the 
name of the first node (or the root node) field in this recursive structure. The second 
one is an array of String which contains names of pointer fields in this data structure. 
In the example of linkedlist shown in Figure 3, the first argument of method generate 
is the string “first”, the name of field /iraf in class LinkedList which represents the 
first node of a linked list. The second one is a string array which only contains 
“pointer” , the name of the pointer field in LinkedList. 
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In the process of this kind of generation, the field /irit and the pointer field pointer 
share the same value domain. Each element of this value domain can only be used 
once except null. After one element is used, it will be removed from the domain. A 
special stack visitedStack is kept during the recursion to contain used elements of the 
value domain. Another stack called^ame^ is used to contain names of pointer fields. 
In the next recursion, one element in the value domain was assigned to the pointer 
field represented by the first element in fNames of the object represented by the first 
element in visitedStack. 

For example, when JMLAutoTest is generating a binary tree, the situation of two 
stacks and value domain are illustrated in Figure 4. At the beginning, stack visited- 
Stack only contains binarytreeObj which is an object of input class BinaryTree. The 
value domain contains four nodes: N„, Nj, and null. During the first recursion, N,, 
is assigned to the field root which is represented by the first element in fNames of the 
Object binarytreeObj represented by the first element in visitedStack. That means let 
binarytreeObj. root = N„. Then the first element in hath JNames and visitedStack is 
removed. Also, the used element N„ is inserted into visitedStack and its two pointer 
fields left and right are inserted into fNames. Recursion follows this algorithm until it 
reaches two states. One is that the value domain only contains null and another one is 
that all elements in visitedStack are null. The first state means finding a candidate 
while the second one means a failure. 

To accelerate the generation of a linked data structure, programmers can choose to 
generate the test case space in a fast way. If the value of field acceleratingEnabled in 
the class Finitization is true, JMFAutoTest only considers a certain structure once 
regardless of other non-pointer fields. JMLAutoTest implements this optimization by 
assigning values to the pointer field only twice. For the first time, a non-null value in 
the value domain is assigned to the pointer field, and for the second time let the 
pointer field be null. Generating all cases of a binary tree with 7 nodes costs 1 second 
while in the normal way, it costs more than 1000 seconds because candidates handled 
by JMLAutoTest in the normal way are much more. However, the test case space 
generated in the fast way is so limited that it only contains all kinds of the structure 
without caring about non-pointer fields. 

2.2 Validity Checking 

After a candidate object is generated, JMLAutoTest checks its validity to judge 
whether it can be used as a test case. The invariant clause in Figure 3 says that if the 
field //rat is null, then the field length must be zero or if first is not null, then the field 
length must equal the real number of nodes in this list (Method getLength returns the 
value of field length and method toObjectSet functions at transforming the linked list 
to a set of nodes. Both of them are omitted in Figure 3). The validity checking in 
JMLAutoTest totally depends on the instance invariant specified in the input class. 

The Invariant Clause in JML. An invariant [1] is a condition that remains true 
during the execution of a segment of code. A instance invariant, which constraints 
both static and non-static states of program execution, can refer to both static and 
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instance members. In JML, invariants belong to both pre-state and post-state 
specifications which are checked in both pre-state, i.e., right after a method call and 
argument passing but just before the evaluation of the method body and post-state, 
i.e., right after the evaluation of the method body but just before the method returns. 

//@ public invariant (first == null&& getLength () ==0) 

//@ I I (first!= null &&getLength () == toObjectSet ( ) . size () ) ; 

public class LinkedListj 

public Node first; // the first node of a linked list; 
protected int length; // the length of this list 

y 

public class Node{ 
public int ID; // node ID 

public Node pointer; // a pointer pointing to the next node 

public JMLObjectSequence makeCase_LinkedList ( ) { 

Finization f = new Finization (LinkedList . class) ; 

//create 3 instances of Node with an argument array [0,1,2] for the 
/ / cons true tor 

JMLObjectSequence nodes = f. createObj ects (Node . class , new 
JMLPrimSequence (new int [] { 0 , 1 , 2 } ) , 3); 

f . add (nodes, null) ; // add null to this domain, 
f.set ("first", nodes) ; // set the value domain for "first" 

// set domain for the field "length" 

f.set ("length", new JMLPrimSequence (1,4)); 

} 

// Generate candidates recursively 

f .generate ( "first" , new String []{ "pointer" }) ; 
return f .getResult ( ) ; 



Fig. 3. Generate the test case space for class LinkedList 
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Fig. 4. The difference of situations of value domain and stacks between at the beginning and 
after a recursion when generating a binarytree 



The Invariant Clause in JML. An invariant [1] is a condition that remains true 
during the execution of a segment of code. A instance invariant, which constraints 
both static and non-static states of program execution, can refer to both static and 
instance members. In JML, invariants belong to both pre-state and post-state 
specifications which are checked in both pre-state, i.e., right after a method call and 




76 Guoqing Xu and Zongyuang Yang 



argument passing but just before the evaluation of the method hody and post-state, 
i.e., right after the evaluation of the method hody hut just before the method returns. 

public void checkInv$instance$T ( ) { 

Boolean rac$v = false; 

[P, rac$v ] 

if (! rac$v) {throw new JMLInvariantError ( ) ; } 

} 

Fig. 5. The method to check the instance invarianttranslated by JML runtime assertion checker 

public boolean checkValidity (Obj ect ob j ) { 
try{ 

//get the name of the input class 

String className = ob j . getClass ( ) .getNameO; 

//get the name of the method of checking invariant 

String invName = "checkInv$instance$"+className; 

Method thisMethod = obj . getClass ( ) . getMethod ( invName , new 
Class[]{}); // get the method of checking invariant 

//invoke this method 

thisMethod . invoke (obj , new Object!] {}); 

} 

catch ( j ava . lang . NoSuchMethodException ex) { 

throw new Exception (" code for class 

" +obj . getClass ( ) .getNameO +" was not compiled with jmlc so 
no assertions will be checked"); 

} // There is no such a method in the class 
catch (JMLInvariantError ex) { 

return false; // Invariant has been violated. 

} 

return true; // Okay 

} 

Fig. 6. Check whether the candidate is valid 

Let T be a type with a set of instance invariants, Pl...Pn .The invariants are first 
conjoined to form a single invariant predicate, i.e., P s Pl^ ,.0 Pn. The conjoined 
invariant predicates are translated into instance invariant methods, whose general 
structures are shown in Figure 5. The notation [P, rac$v] denotes translated code that 
evaluates the predicate P and stores the result into the variable rac$v. The invariant 
methods evaluate the conjoined invariant predicates and throw invariant violation 
errors if the predicates do not hold. 

Invariant Checking in JMLAutoTest. Figure 6 illustrates how invariant is checked 
in JMLAutoTest. If the method checkInv%instance%T is not found in the input class, 
this class was not compiled by JML compiler that a new exception was thrown. 
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If a JMLInvariantError exception is caught when the method is invoking, this can- 
didate is not valid because the invariant is violated. If no exceptions are caught, this 
candidate is valid and the method checkValidity returns true. If no invariants specified 
in the input class, the method checkValidity always returns true. 



2.3 Nonisomorphism 

At the end of the process of generation, JMLAutoTest explores the space of generated 
test cases to make sure the candidate is not isomorphic to a certain existing test case. 
We do not define what isomorphism is in JMLAutoTest. Our solution is totally based 
on the method equals defined in the input class. 

Let Obj be the candidate object and let S be the set of test cases generated. Obj is 
isomorphic to an existing test case iff 3cg S. Obj.equals(c). There is an advantage of 
this solution that programmers can easily change the criteria to make different kinds 
of test cases by modifying the equals method. 



3 Double-Phase Testing 

This section presents how JMLAutoTest avoids meaningless test cases. After the 
generation of the test case space presented by the previous section, there might be 
many meaningless ones in the space whereas individual candidate itself is valid. The 
major idea of double-phase testing is to use two phases of testing based on statistics. 

Double-phase testing is especially fit for the black box test and the test with a large 
test case space although it will spend some time running pre-test. If the test case space 
is not very large (maybe only contains a few dozen of cases), we do not need to use 
the double-phase testing. Programmers can decide whether to use this method by 
choosing whether to run the test with an argument “-pre”. Running the test case pro- 
vider (class *_JML_TestCase) without any arguments means the test will be run in a 
conventional way. 



3.1 Making an Operational Profile 

There is a method makeOperationalProfile in the test client. Programmers can modify 
this method to divide the whole test case space into several partitions. JMLAutoTest 
can generate the skeleton codes of this method. Test case spaces of different types are 
put into a hash table spaces in the test case provider. Then this hash table is passed to 
the method makeOperationalProfile in test client. For example, we can put linked lists 
which contain more than three nodes into partition_LinkedList[0] and put those 
which contain less than three nodes into partition_LinkedList[l]. Variable parti- 
tion_LinkedList is a JMLObjectSequence array, each element of which represents a 
partition of this test case space. Finally partitions of each type are put into a hash 
table partitions which will be returned by this method. 
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3.2 The First Phase 

During the first phase, a small number of test cases taken from each partition ran- 
domly make several groups. The percentage of test cases which should be taken out 
from each partition is provided by the programmer as the second argument of the 
main method (i.e., if arguments are “-pre 0.3”, the percentage is 30%.). Tests with 
each group are run respectively. Figure 7 illustrates the generated codes for the first- 
phase test. If the pre-test is not disabled by programmer, let isPre (a flag to show 
whether it is a pre-test or a final test) be true. 

Taking Test Cases Out from Partitions. The technique of getting these test cases is 
based on the following principle. 

Let p be the percentage provided by the programmer. Let P be a partition of a cer- 
tain test case space, and Let C be a sub set of P. C is the set of cases which should be 
taken out iff C.size ==[ P.size*p 1 and Vug [0, C. size-1], C.elementAt(n)== 
P.elementAt( L n/p J ). The operator == is Java’s comparison by object identity. The 
definition above illustrates that the process of getting test cases is totally based on the 
average distribution in statistic. 

Algorithm Used in Running the Pre-test. In section 3.1, we have presented that 
each partition of test case space of a certain type is represented by an element in a 
JMLObjectSequence array. We use an algorithm to record the number of meaningless 
test cases during the first phase that we shift the sequence of elements in a partition 
backward and let the first unit contain the number of meaningless cases. At first, we 
put zero into its first unit. 

When the test suite is run, the sum of number of meaningless test cases taken from a 
certain partition in the current test and the value in its first unit is put into this unit of 
the partition. So during the first phase, the value in the first unit of a partition changes 
for several times. Through comparing the final value in the first unit of a partition 
with one another in the same test case space, we can get the approximate proportion 
of meaningless cases in this partition among all those meaningless in the whole test 
case space. 

Method runPreTest shown in Figure 7 is a recursive method which explores every 
partition in test case space of each type. We keep a hash table arg_ind which repre- 
sents a vector of indices of partitions in test case space of each type. We continue 
with the example method findSubList. There are two test case spaces generated for 
the type LinkedList and Node. If test case space of LinkedList has been divided into 
two partitions and the space of Node has been divided into three partitions, in the first 
phase, test suite should be run for six times. Each time before test suite is run; the 
value of arg_ind should be changed to show the indices of partitions in these two test 
case spaces. At the beginning, arg_ind is the vector (0, 0) which means test cases of 
both type LinkedList and Node should be taken out from No.O partition in the two test 
case spaces. During the first recursion, arg_ind should be changed to (0, 1) and the 
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recursion ends when arg_ind reaches (1, 2) which means all partitions have been 
visited. When test suite is run, the init_vT methods presented in the next section get 
the test cases from different partitions according to the vector represented by arg_ind. 

//The following codes appear in the test case provider 
// (*_JML_TestCase. java) for testing the method "findSubList" . 

//an object of test client 

JMLTestClient myClient = new JMLTestClient { ) ; 

//Initialize hashtable param in order to receive test cases 
Hashtable param = new Hashtable (); 

// Get test cases from JMLTestClient 
param. put ( "LinkedList " , 

myClient .makeTestCases_LlnkedList { ) ) ; 

// Divide the whole test case space into several partitions 
Hashtable args_f indSubList 

= myClient .makeOperationalProfile (param) ; 

// If the pre-test has not been disabled by programmer. 

if (args . length==2 && args [0] ==" -pre" ) { 

// A flag to show whether it's a pre-test or a final-test 
boolean isPre = false; 

//Percentage of test cases which should be taken from each 
//partition 

double percentage = Double .parseDouble (args [1] ) ,■ 

// run test suite for several times 
runPreTest ( percentage ) ; 

Fig. 7. Generated codes in test case provider (*_JML_TestCase) for the first phase test 

3.3 The Second Phase 

During the second phase, test cases from partitions should be reorganized and the 
final test is run. 

Reorganization of Test Cases. After the first phase test, the first unit of each 
partition has contained the number of all meaningless test cases taken from this 
partition. Although this number might be more than the real one, it can reflect the real 
situation of each partition in a certain test case space. After the disposal of these 
numbers, we can get the proportion of test cases which should be taken out from each 
partition of a certain test case space in the final test. Let S be the set of such 
proportions. Let C be the set of values contained in the first unit of each partition in a 
test case space. The following algorithm is used in JMLAutoTest to get the test cases 
in the final test. 

Algorithm: s = sum(C); 

forall UG [ 0, C. size-1 ], S.elementAt(n)= (s - C. element At(n) )/s; 

Finally, in a certain test case space, let Tj be the set of test cases taken from parti- 
tion Pj, T^ be the set of cases taken from the partition Pj... Tn be the set taken from 
partition Pn. The operation of taking test cases from Pi is also based on the average 
distribution with the proportion S.elementAt(i-l). Then we have the equation 
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V iG [1 , n], Ti.size == LPi.size * S.elementAt(i-l)J . Let T be T„ U Tj ...U Ti. Then T 
is the set of test cases to be supplied in the final test. 

/** The following method is defined in the test class and 
is overridden in test case provider class. */ 

public void init_vLinkedList ( ) { 

if (isPre) { // if it is in the first phase 

/** Get the index of partition from which test cases should 
be taken */ 

int ind= ((Integer 

arg_Ind . get ( "LinkedList" ) ) . int Value ( ) ; 

/** Initialize vLinkedList with cases from the partition 
whose index is represented by ind */ 



} 

else { // It is a final test 

/** Initialize vLinkeList with cases from all partitions 
according to the obtained proportions. */ 



Fig. 8. The method init_vT in test case provider class 

4 Test Oracle Generation 

JMLAutoTest uses the same way of generating test oracles as JML+JUnit testing 
framework [2] which combines JML [3] and JUnit [7] to test individual method. 

4.1 Setting up Test Fixture 

The test fixture for the class C is defined as: 

C[ ] receivers; 7’i[ ] vTi; ... ; Tn{ ] vTl; 

The first array named receivers is for the set of receiver objects (i.e., objects to be 
tested) and the rest are for argument objects. 

The receiver and argument objects are initialized by the method init_receivers and 
init_vTi in the test case provider class. Figure 8 describes generated codes in method 
init_vLinkedList for initializing vLinkedList. If it is in the first phase, test cases are 
taken from the partition, the index of which is represented by the value in the hash 
table arg_ind and test cases should be taken out from all partitions and mixed together 
in the second phase. 

4.2 Testing a Method 

For each instance (i.e., non-static) method of the form: 

T M(A1 al,: : :, An an) throws El,..., Em { /* ... */ } 
of the class C, a corresponding test method testM is generated in the test class 
C_JML_Test. Let n be the value of vTl.length * vT2*...* vTn.length. Then, the 
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method to be tested is executed for n times until each element in each array (vTi) has 
been visited. Pre-condition of the target method is checked firstly. If the pre-condition 
has been violated and the current test is the pre-test (isPre==true), let the variable 
meaningless shown in figure 8 increase. If the post-condition of the method is vio- 
lated, JMLAutoTest handles it in different ways in two phases. During the first phase, 
this exception is ignored and the test continues since we just want to know the num- 
ber of meaningless test cases not caring about whether the execution fails or suc- 
ceeds. But during the second phase, this exception should be thrown to let program- 
mer know execution of the method is not correct. 



5 Experimental Results 

This section presents performance of JMLAutoTest on testing a method. To monitor 
the process of test case generation and testing a method, JMLAutoTest uses a class 
JMLTestDataStat to record some key data. We use method findSubTree(BinaryTree 
parentTree, Node thisNode) whose function is to find a sub tree whose root is repre- 
sented by thisNode in the parentTree as the benchmark for which we show 
JMLAutoTest’ s performance. 

5.1 Generating Test Cases and Dividing Test Case Spaces 

Figure 9 describes the definition of the BinaryTree and Node. The invariant clause 
tells us if root is null, the size must be 0. If root is not null, the size must equal the 
number of total nodes in the tree. What Figure 10 illustrates is the JML specifications 
for the method public BinaryTree findSubTreeiBinaryTree parentTree, Node this- 
Node). The pre-condition of the method requires that neither of its arguments can be 
null and there must be a node in parentTree whose ID equals the ID of thisNode. 

We generate the test case space of type BinaryTree with a few nodes whose ID are 
ranging from 5 to 8. We also generate the case space of type Node which contains 12 
nodes whose IDs are from 0 to 11. 

For the test case space of type BinaryTree, We do not divide it and leave it as the 
only partition. For the space of type Node, We divide it into two partitions. The first 
one contains nodes whose ID varies from 0 to 5 and the second one contains the rest. 

5.2 Test Results 

Table 1 shows JMLAutoTest’ s performance when we test the method with binary 
trees containing nodes from 5 to 8. We generate the test case space of BinaryTree in 
the fast way, so the number of candidates considered is close to that of test cases 
generated. We use the arguments “-pre 0.25” to run the test. Note that for all kinds of 
binary trees listed in table 1, almost all test cases in the final test are meaningful. 
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public class BinaryTreej 

//@ public invariant (root ==null && 

II® getSizeO ==0) | | (root ! =null&&getSize ( ) !=0 
II® Sc&toObjectSet (root) . size 0 == getSizeO); 

public Node root ; 
protected int size; 
public int getSize ( ) {... } 

public JMLOb j ectSequence toObj ectSet ( ) {...} 

... } 

public class Node{ 
public Node left; 
public Node right; 
public int ID; } 

Fig. 9. The Definition of class BinaryTree and Node 

/*+@ public normal_behavior 

@ requires parentTree ! =null && thisNode ! =null && 

@(\exists Node n; OparentTree . toOb j ectSet ( ) . has (n) ; 

@n.ID== thisNode . ID) ; 

@ assignable \nothing; 

@ ensures \result . root . ID == thisNode. ID ; 

@+*/ 

Fig. 10. The pre-condition of method findSubTree 

Then we make a comparison between the performance of testing in double-phase 
way and the conventional way (Table 2). Note that for all binary trees with more than 
five nodes, total time of the test in double-phase way is less than the corresponding 
time in the conventional way and the more test cases are, the more time double-phase 
testing can save. Although some meaningful test cases have also been filtered out in 
double-phase testing, the test case space is still large enough to test the correctness of 
the method. 



6 Related Work 

There are now quite a few testing facilities and approaches based on formal specifica- 
tions developed and advocated by many different research groups. One of the earliest 
papers by Goodenough and Gerhart [8] presents its importance. Approaches like 
automated test case generation from UML statecharts [9,21] and Z specifications 
[10,20] present ways of generating test cases from formal specifications. There are 
also some tools which can generate Java test cases like the TestEra framework 
[11,22] which requires programmers to learn a specification language based on 
which, test cases can be generated. All these specifications do not generate test cases 
for Java programs annotated with JML specification which is widely accepted as the 
ancillary tool tailored for Java to keep the correctness of programs. 

Several researchers noticed that if a program is formally specified, it should be 
possible to use the specification as a test oracle [12, 13, 14]. Cheon and Leavens [2] 
present the JMLUnit framework which can generate test oracles for JUnit [17] from 
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JML specifications. This framework uses JML runtime assertion checker to decide 
whether methods are working correctly, thus automating the writing of unit test ora- 
cles. However it has not automated the generation of test cases which is still a labor- 
intensive for programmers. 



Table 1. Performance of JMLAutoTest for testing the method findSubTree with arguments “- 
pre 0.25” (test cases is generated in the fast way). 



nodes in 


generated 


candidates 


meaningful cases 


total test cases 


binary tree 


binary trees 


considered 


in the final test 


in the final test 


5 


42 


64 


410 


492 


6 


132 


196 


1572 


1572 


7 


429 


625 


5136 


5136 


8 


1430 


2055 


17148 


17148 



Table 2. Performance comparison between the double-phase testing in JMLAutoTest and the 
conventional way in JMLUnit testing framework. 





Double-phase way 


Conventional way 


nodes 

in 

binary 

tree 


meaningful 
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final test 


time in 
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phase 
(s) 


total 
time of 
the test 

(s) 


meaningful 
/total in final 
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(s)2 


total time 
of the test 

(s) 


5 


410/492 


0.079 


0.266 


420/1008 


0 


0.219 


6 


1572/1572 


0.188 


0.422 


1584/3168 


0 


0.468 


7 


5136/5136 


0.36 


0.766 


6006/10296 


0 


1.25 


8 


17148/17148 


0.703 


2.016 


22880/34320 


0 


3.484 



Boyapati, Khurshid and Marinov describe Korat [4] which can finish automated 
generation of test cases for Java programs with formal specifications. Korat generates 
linked data structures based on additional Java predicates. However Korat requires 
that the programmer who runs the test must know well about the details of the pro- 
gram to be tested, therefore it is not fit for a black box test. Also, it can not keep 
meaningless test cases from being handled. 

There are quite a few approaches to applying the statistical models to Testing 
[16,18,19]. Statistical testing has been widely adopted during the development of the 
Cleanroom software[5] in test cases acquisition, results evaluation and reliability 
modeling. So it is not a new idea to use the statistical analysis in testing. But in 
JMLAutoTest the novelty lies in applying the statistical analysis to filtering out mean- 
ingless cases. This idea can also be used in testing of programs written with other 
languages. 



^ All the time in this column is zero because there is no the first phase test in conventional 
testing. 
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7 Conclusions 

This paper presents JMLAutoTest, a novel testing framework designed for Java pro- 
grams annotated with JML specifications. 

JMLAutoTest automatically generate three classes for a target method. In the test 
client, testers can generate test cases for any kinds of types including linked data 
structures and common types in either a fast way or a normal way very easily. 
JMLAutoTest verifies the validity of a candidate by checking its invariant with JML 
runtime assertion checker. 

JMLAutoTest provides a double-phase testing way for the test of a method. It is 
the statistic based testing which filters out meaningless test cases without requiring 
testers to know the details of the method to be tested. According to the operational 
profile made by the tester, the generated test case space can be divided into several 
partitions. During the first phase, a small number of test cases are taken out from each 
partition. Then the test suite is run for several times to record the number of meaning- 
less cases of each group. Based on statistical principles, we should estimate the ap- 
proximate proportion of the meaningless test cases in each partition. During the sec- 
ond phase, test cases taken out from each partition according to these calculated 
proportions are mixed together and supplied to the test. Time spent visiting meaning- 
less test cases in the final test is saved. 
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Abstract. Compositional testing concerns the testing of systems that consist of 
communicating components which can also he tested in isolation. Examples are 
component based testing and interoperability testing. We show that, with certain 
restrictions, the ioco-test theory for conformance testing is suitable for compo- 
sitional testing, in the sense that the integration of fully conformant components 
is guaranteed to be correct. As a consequence, there is no need to re-test the inte- 
grated system for conformance. 

This result is also relevant for testing in context, since it implies that every failure 
of a system embedded in a test context can be reduced to a fault of the system 
itself. 



1 Introduction 

In this paper we study formal testing based on the ioco-test theory. This theory works 
on labeled transition systems (LTS) [1,2]. The name ioco, which stands for input/ 
output conformance, refers to the implementation relation (i.e., notion of correctness) 
on which the theory and the test generation algorithm have been built. A number of 
tools are based on the ioco theory, among which there are TGV [3], TestGen [4] and 
TorX [5]. 

Two open issues in testing theory in general, and the ioco-theory in particular, are 
compositional testing and testing in context. For instance, for the testing theory based 
on Finite-State-Machines (FSM) this issue has been studied in [6]. 

Compositional testing considers the testing of communicating components that to- 
gether form a larger system. An example is component based testing, i.e., integration 
testing of components that have already been tested separately. An example from the 
telecom sector is interoperability testing, i.e., testing if systems from different manu- 
facturers, that should comply with a certain standard, work together; for example GSM 
mobile phones. The question is what can be concluded from the individual tests of the 
separate components, and what should be (re)tested on the integration or system level. 
With the current theory it is unclear what the relation between the correctness of the 
components and the integrated system is. 

* This research was supported by Ordina Finance and by the dutch research programme 
PROGRESS under project: TES5417: Atomyste - ATOm splitting in eMbedded sYStems 
TEsting. 

A. Petrenko and A. Ulrich (Eds.): FATES 2003, LNCS 2931, pp. 86-100, 2004. 
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Another scenario, with similar characteristics, is testing in context. This refers to 
the situation that a tester can only access the implementation under test through a test 
context [7-9]. The test context interfaces between the implementation under test and the 
tester. As a consequence the tester can only indirectly observe and control the lUT via 
the test context. This makes testing weaker, in the sense that there are fewer possibilities 
for observation and control of the lUT. With testing in context, the question is whether 
faults in the lUT can be detected by testing the composition of lUT and test context, and 
whether a failure of this composition always indicates a fault of the lUT. This question 
is the converse of compositional testing: when testing in context we wish to detect 
errors in the lUT — a component — by testing it in composition with the test context, 
whereas in compositional testing we wish to infer correctness of the integrated system 
from conformance of the individual components. 

This paper studies the above mentioned compositionality properties of ioco for two 
operations on labeled transition systems: parallel composition and hiding. If ioco has 
this compositionality property for these operations, it follows that correctness of the 
parts (the components) implies correctness of the whole (the integrated system), or that 
a fault in the whole (lUT and test context) implies a fault in the component (lUT). This 
compositionality property is formally called a pre-congruence. 

We show that ioco is a pre-congruence for parallel composition and hiding in the 
absence of underspecification of input actions. One way to satisfy this condition is to 
only allow specifications which are input enabled. Another way is to make the under- 
specification explicit by completion. We show that, in particular, demonic completion 
is suitable for this purpose. As a final result we show how to use the original (uncom- 
pleted) specifications and still satisfy the pre-congruence property. This leads to a new 
implementation relation, baptized iocoy which is slightly weaker than ioco. 

This paper has two main results. First we show a way to handle underspecifica- 
tion of input actions when testing communicating components with the ioco theory. 
This idea is new for LTS testing. It is inspired by [10] and similar work done in FSM 
testing [11]. Second we establish a formal relation between the components and the 
integrated system. As far as we know this result is new for both LTS testing and FSM 
testing. 

Overview. The next section recalls some basic concepts and definitions about transition 
systems and ioco. Section 3 sets the scene and formalizes the problems of composi- 
tional testing and testing in context. Section 4 studies the pre-congruence properties of 
ioco for parallel composition and hiding. Section 5 discusses underspecification, and 
approaches to complete specifications with implicit underspecification. Section 6 con- 
cludes with some final remarks and an assessment of the results. For a full version of 
this paper with all the proofs, we refer to [12]. 

2 Formal Preliminaries 

This section recalls the aspects of the theory behind ioco that are used in this paper; 
see [1] for a more detailed exposition. 

Labeled Transition Systems. A labeled transition system (LTS) description is defined 
in terms of states and labeled transitions between states, where the labels indicate what 
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happens during the transition. Labels are taken from a global set L. We use a special 
label r ^ L to denote an internal action. For arbitrary L C L, we use as a shorthand 
for L U {r}. We deviate from the standard definition of labeled transition systems in 
that we assume the label set of an LTS to be partitioned in an input and an output set. 

Definition 1. A labeled transition system is a 5-tuple {Q, I, C/, T, qo) where Q is a non- 
empty countable set of states; / C L w the countable set of input labels; U C L w the 
countable set of output labels, which is disjoint from /; T C Q x (/ U C/ U {r}) x Q 
is a set of triples, the transition relation; qo & Q is the initial state. 

We use L as shorthand for the entire label set (L = I U U)\ furthermore, we use Qp, Ip 
etc. to denote the components of an LTS p. We commonly write q q' for {q, A, q') € 
T. Since the distinction between inputs and outputs is important, we sometimes use a 
question mark before a label to denote input and an exclamation mark to denote output. 
We denote the class of all labeled transition systems over I and U by CTS{I, U). We 
represent a labeled transition system in the standard way, by a directed, edge-labeled 
graph where nodes represent states and edges represent transitions. 

A state that cannot do an internal action is called stable. A state that cannot do an 
output or internal action is called quiescent. We use the symbol S Lit) to represent 

quiescence: that is,p-^p stands for the absence of any transition p p' with X G Ur- 

For an arbitrary L C L,-, we use Ls as a shorthand for LU{6}. 

An LTS is called strongly responsive if it always eventually enters a quiescent state; 
in other words, if it does not have any infinite C/T-labeled paths. For technical reasons 
we restrict CTS{I, U) to strongly responsive transition systems. Systems that are not 
strongly responsive may show live-locks (or develop live-locks by hiding actions). So 
one can argue that it is a favorable property If a specification is strongly responsive. 
However, from a practical perspective it would be nice if the constraint can be lessened. 
This is probably possible, but needs further research. 

A trace is a finite sequence of observable actions. The set of all traces over L (C L) 
is denoted by L* , ranged over by tj, with e denoting the empty sequence. If (Ti, ct 2 € L*, 
then (Ti (72 is the concatenation of ui and (T 2 . We use the standard notation with single 
and double arrows for traces: q > q denotes q q', q q' denotes 

q > (f and q ^ g denotes q • • • -2^ q' (where o; G Lirs)- 

We will not always distinguish between a labeled transition system and its initial 
state. We will identify the process p = {Q, I, U, T, qo) with its initial state go, and we 
write, for example, p gi instead of go gi- 

Input-Output Transition Systems. An input-output transition system (lOTS) is a la- 
beled transition system that is completely specified for input actions. The class of input- 
output transition systems with input actions in I and output actions in U is denoted by 
TOTS {I, U) (C CTS{I, U)). Notice that we do not require lOTS’s to be strongly re- 
sponsive. 

Definition 2. An input-output transition system p = {Q, I, U, T, qo) is a labeled transi- 
tion system for which all inputs are enabled in all states: Wq G Q,a G I : q 

Composition of Labeled Transition Systems. The integration of components can be 
modeled algebraically by putting the components in parallel while synchronizing their 
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common actions, possibly with internalizing (hiding) the synchronized actions. In pro- 
cess algebra, the synchronization and internalization are typically regarded as two sep- 
arate operations. The synchronization of the processes p and q is denoted by p || g. The 
internalization of a label set V in process p, or hiding in p as it is commonly called, 
is denoted by hide V in p. Below we give the formal definition. 

Definition 3. For i = 1,2 let pi = {Qi, li, Ui, Ti,pi) be a transition system. 

o Ifh n /2 = C/i n C/2 = 0 then pi || p2 =def (Q, U, T,pi || P2) where 

o Q = {qi II 52 I 9 l C Qi,Q2 C Q2}; 
o I ={h\ U 2 ) U (J 2 \ C/i); 
o C7 = C/i U C/ 2 . 

o T is the minimal set satisfying the following inference rules f/i € Lr): 
qi-^q[,P^L2 qi\\q2-^q'i\\q2 

q2-‘^q'2,F^Li l- <7i II <72-^91 II ?2 

qi q'l, q2 q'2, iJ- ¥= T \- qi || 92 q[ II q'2 

o IfV C U\, then hide V inpi =def (Q, Iii U\ \ V, T, hide V inpi) where 

o Q = {hide V inqi | gi G Qi}; 

o T is the minimal set satisfying the following inference rules f/i € Lr): 

qi q[, p \- hide V in qi hide V in q[ 

9i q[, p GV \- hide V in qi~^ hide V in q[ 

Note that these constructions are only partial, there are constraints on the input and 
output sets. Moreover, parallel composition may give rise to an LTS that is not strongly 
responsive, even if the components are. For the time being, we do not try to analyze 
this but implicitly restrict ourselves to cases where the parallel composition is strongly 
responsive (thus, this is another source of partiality of the construction). 

In this paper we restrict ourselves to binary parallel composition. N-ary parallel 
composition may be an interesting extension. One may wonder however what this 
means in our input output setting, since an output action is uniquely identified by its 
sender. From this perspective only the synchronization of many receivers to one sender 
(broadcast) seems an interesting extension. 

Proposition 1. Letp, q G CTS{Ii, Ui)fori = p, q, with Ip n Iq = Up H Uq = 0 , and 
let V C Up. 

1. Ifp II 9 is strongly responsive then p || g G CTS{{Ip \ Uq) U {Iq \ Up), Up U Uq); 

moreover, p || 9 G TOTS ifp, 9 G TOTS. 

2. hide 1^ inp G CTS{Ip, Up \ V); moreover, hide V inp G JOTS ifp G JOTS. 

Conformance. The testing scenario on which ioco is based assumes that two things 
are given: 1) An LTS constituting a specification of required behavior. And 2) an im- 
plementation under test. We treat the lUT as a black box. In order to reason about it 
we assume it can be modeled as an lOTS (an lUT is an object in the real world) . This 
assumption is referred to as the test hypothesis [7]. We want to stress that we do not 
need to have this model when testing the lUT. We only assume that the implementation 
behaves as an lOTS. 
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Given a specification s and an (assumed) model of the lUT i, the relation i ioco 
s expresses that i conforms to s. Whether this holds is decided on the basis of the 
suspension traces of s: it must be the case that, after any such trace cr, every output 
action (and also quiescence) that i is capable of should be allowed according to s. This 
is formalized by defining p after a (the set of states that can be reached in p after the 
suspension trace cr), out{p) (the set of output and i5-actions of p) and Straces{p) (the 
suspension traces of p). 

Definition 4. Letp, s G CTS{I, U), let i G TOTS {I, U), let P C Qp be a set of states 
in p and let a G L|. 

1. pafter a =def { p' | p^p' } 

2. out{p) =def {xGC/ |p^}U{(5|p^} 

3. OUt{P) =def U { outij)) I p G P } 

4. Straces{p)=def{o- G L} \ p=^ } 

The following defines the implementation relation ioco, modulo a function T that gen- 
erates a set of test-traces from a specification. In this definition 2^ denotes the powerset 
of X, for an arbitrary set X. 

Definitions. Given a function P : CTS{I,U) — > 2^^, we define the implementation 
relation iocojr C POTS {I , U) x CTS{I, U) as follows: 

i iocojF s Vcr G T{s) : out{i after cr) C out{s after cr) 

So i iocostraces s means Vcr G Straces{s) : out{i after cr) C out{s after cr). We 
use ioco as an abbreviation for iocost^aces- For more detailed information about ioco 
we refer to [1]. 

3 Approach 

In this section we want to clarify compositional testing with the formal framework pre- 
sented in the previous section. The consequences for testing in context will be discussed 
in the final section. 

We study systems that consist of communicating components. These components 
can be tested individually and while working together (in the case of testing in context 
the components are the lUT and its test context). The behavior of such a system is de- 
scribed by the parallel composition of the individual transition systems. Output actions 
of one component that are in the input label set of another component are synchronized, 
resulting in a single, internal transition of the overall system. Actions of a component 
that are not in the label set of another component are not synchronized, resulting in a 
single observable transition of the overall system. This gives rise to the scenario de- 
picted in Figure 1. The figure will be explained in the next example. 

3.1 Example 

To illustrate compositional testing, we use two components of a coffee machine: a 
“money component” (mon) that handles the inserted coins and a “drink component” 
(drk) that takes care of preparing and pouring the drinks, see Figure 1 . 
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Scof = hide {make-coffee, make-tea, error} in Smon II Sdr* 
icof = hide {make-Coffee, make-tea, error} in imon || idrk 



Fig. 1. Architecture of coffee machine in components 



The money component accepts coins of €l and of €0.50 as input from the envi- 
ronment. After insertion of a €0.50 coin (respectively € 1 coin), the money component 
orders the drink component to make tea (respectively coffee). 

The drink component interfaces with the money component and the environment. 
If the money component orders it to make tea (respectively coffee) it outputs tea (respec- 
tively coffee) to the environment. If anything goes wrong in the drink making process, 
the component gives an error signal. 

The coffee machine is the parallel composition of the money component and the 
drink component, in which the “make coffee” command, the “make tea” command and 
the “error” signal are hidden. One can think of the parallel composition as establishing 
the connection between the money component and the drink component, whereas hiding 
means that the communication between the components is not observable anymore; 
only communication with the environment can be observed. 

Models. In Figure 2 we show the behavioral specification of the money component 
Smon and the drink component Sdrk as LTS’s. Note that the money component is un- 
derspecified for the error input label and that the drink component cannot recover 
from an error state, and while in the error state it cannot produce tea or coffee. Fig- 
ure 3 shows implementation models of the money component, imon, and the drink 
component, idrk- We have used transitions labeled with ‘?’ as an abbreviation for all 
the non-specified input actions from the alphabet of the component. The money com- 
ponent has input label set, Imon = {0 .50 , 1 .00 , error}, output label set Umon = 
{make-cojfee, make-tea, 0.50, 1 .00}', specification Smon G 'CT5(/,„on, C4ion), and 
implementation imon G I^OTS{I„,omUmon)- Idrk = {make -coffee, make -tea} and 
Udrk = {coffee, tea, error} are the input and output label set respectiveley and Sdrk & 
I^'I'^{Idrk, Udrk), 'idrk ^ '-IUi^^{Idrk, Udrk)- 

In the implementations of the components we choose to improve upon the specifica- 
tion, by adding functionality. This is possible since ioco allows partial specifications. 
Implementers are free to make use of the underspecification. The extra functionality 
of imon compared to its specification Smon is that it can handle error signals: it reacts 
by returning €1.00. idrk is also changed with respect to its specification Sdrk- making 
tea never produces an error signal. Since implementations are input enabled, we have 
chosen that all non specified inputs are ignored, i.e., the system remains in the same 
state. 
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!make 

coffee 




!make 

tea 



coffee \ ?make ?make ]!tea 

' /Coffee tea ' 

?make coffeS\ fftnoke coffee 

?make tea \j!error^^M:^^error\/ 'make tea 

?make / , T \ ?make 

coffee V y tea 



money component specification 



?make 

coffee 



?make 

tea 



drink component specification 
Fig. 2. Specification of money and drink components as LTS’s 




money component implementation drink component implementation 



Fig. 3. Implementation of the money and drink components as lOTS’s 



We have imon ioco Smon and idrk ioco Sdrk- The question now is whether the 
integrated implementation, as given by icof in Figure 1, is also ioco correct with re- 
spect to the integrated specification Sco/- We discuss this in section 4, to illustrate the 
compositionality properties discussed there. 



3.2 Compositional Testing 

We now paraphrase the question of compositional testing, discussed in the introduction, 
as follows: “Given that the components p and q have been tested to be ioco-correct (ac- 
cording to their respective specifications), may we conclude that their integration is 
also ioco-correct (according to the integrated specification)?” If the component speci- 
fications are LTS’s, the component implementations are modeled by lOTS’s, and their 
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integration by parallel composition followed by hiding, this boils down to the following 
questions in our formal framework (where C JOTS {Ik, Uk) and Sk C CTS{Ik, Uk) 
for k=l,2, with f] I 2 = U\ f] U 2 = 0): 

Ql: Given ik loco Sk for k = 1,2, is it the case that ii || 12 loco si || 

Q2: Given loco si, is it the case that (hide V \nii) ioco (hide V in si) for 
arbitrary V CUiI 

If the answer to hoth questions is “yes”, then we may conclude that ioco is suitable for 
compositional testing as stated in the following corollary. 

Conjecture 1. If ik C JOTS{Ik, Uk) and Sk C CTS{Ik,Uk) for k = 1,2 with T n 
/2 = t/i n C /2 = 0 and y = (/i n C/ 2 ) U (C/i n I 2 ), then 

ii ioco Si A *2 ioco S 2 ^ (hide V in ii || 12 ) ioco (hide V in si || S 2 ) . 

We study the above pre-congruence questions in the next section. We will show that 
the answer to Ql and Q2 in general is no. Instead, we can show that the answer to Ql 
and Q2 is yes if si and S 2 are completely specified. 



4 Compositionality for Synchronization and Hiding 

In this section we address the questions Ql and Q2 formulated above (Section 3.2), 
using the coffee machine example to illustrate our results. 

4.1 Synchronization 

The property that we investigate for parallel composition is: if we have two correct 
component implementations according to ioco, then the implementation remains cor- 
rect after synchronizing the components. It turns out that in general this property does 
not hold, as we show in the following example. 

Example 1. Regard the LTS’s in figure 4. On the left hand side we show the specifica- 
tions and on the right hand side the corresponding implementations. The models have 
the following label sets: si € CTS{{x},%),i\ S JOTS{{x},%),S 2 € CTS{%,{x}) 
and i 2 € JOTS{%, {a:}). The suspension traces of si are given by 5* U 5*1x5* and 
the suspension traces of S 2 are given by {e, !x}U!a;!a:(5*. We have i\ ioco si and 
l2 ioco S2- 

After we take the parallel composition of the two specifications we get si || S 2 , 
see figure 4 (the corresponding implementation is i\ || * 2 ). We now see the following: 
out{ii II Z 2 after!x) = {lx} ^ out{si || S 2 after!a:) = {5}; this means that the parallel 
composition of the implementations is not ioco-correct: ii || *2 io/:o si || 52- 

Analysis shows that ii ioco si, because ioco allows underspecification of input 
actions. However, the semantics of the parallel composition operator does not take un- 
derspecification of input actions into account. Although S 2 can output a second x, it 
cannot do so in si || S 2 , because si cannot input the second x. 
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Fig. 4. Counter example to compositionality for parallel composition; see Example 1 



It turns out that if we forbid implicit underspecification, i.e., if the specification 
explicitly prescribes for any possible input what the allowed responses are, then we do 
not have this problem. In fact in that case we have the desired compositionality property. 
This property is expressed in the following theorem. For a proof see [12]. 

Theorem 1. Let si,i\ e XOTS{Ii,Ui), si,ii C XOT5(/2, C/ 2 ). wR/j /i n /2 = 

c/i n C /2 = 0. 

i\ ioco Si A ii ioco S2 ii || 12 ioco si || S2 

Our running example (Section 3.1) shows the same problem illustrated in exam- 
ple 1 . Although the implementations of the money component and the drink component 
are ioco correct with respect to their specifications, it turns out that the parallel com- 
position of imon and idrk is not: 

out{imon II idrk after 11 .00 - [make -Coffee) = {Icojfee, lerror} 
out{smon || s drk after 1 1 .OO-lmake-Coffee) = {Icojfee} 

Note that the internal signals are still visible as output actions. To turn them into 
internal actions is the task of the hiding operator, discussed below. 

4.2 Hiding 

The property that we investigate for hiding is the following: if we have a correct im- 
plementation according to ioco, then the implementation remains correct after hiding 
(some of the) output actions. It turns out that, as for synchronization, in general this 
property does not hold. 

Example 2. Consider the implementation i and specification s in Figure 5, both with 
input set {a} and output set {x, y}. The suspension traces of s are {e}U?a<5*U!a:i5*. We 
see that i ioco s. 

We get the specification hide {a;} in s, and implementation hide {x} in i after 
hiding the output action x. After the input a we get: oitf (hide {x} in i after a) = 
{<5, y} 2 0 Mf(hide {x} in s after a) = {5}; in other words the ioco relation does not 
hold: hide {cc} in i iofo hide {a:} in s. 

An analysis of the above example shows that s was underspecified, in the sense that 
it fails to prescribe how an implementation should behave after the trace !x?a. The 
proposed implementation i uses the implementation freedom by having an unspec- 
ified y-output after !xla. However, if x becomes unobservable due to hiding, then 
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Fig. 5. Counter-example to compositionality for hiding; see Example 2 



the traces !x?a and la collapse and become indistinguishable: in hide {a;} in s and 
hide {a;} in i they both masquerade as the trace la. Now hide {a;} in s appears to 
specify that after la, only quiescence (i5) is allowed; however, hide {a;} in i still has 
this unspecified y-output. In other words, hiding creates confusion about what part of 
the system is underspecified. 

It follows that if we rule out underspecification, i.e., we limit ourselves to specifi- 
cations that are lOTS’s then this problem disappears. In fact, in that case we do have 
the desired congruence property. This is stated in the following theorem. For a proof 
see [12]. 

Theorem 2. Ifi, s € IOTS{I, U) with V CU, then: 

i ioco s (hide V in i) ioco (hide V in s) 



5 Demonic Completion 

We have shown in the previous section that ioco is a pre-congruence for parallel com- 
position and hiding when restricted to JOTS x JOTS. However, in the original the- 
ory [1] ioco C JOTS X CTS\ the specifications are LTS’s. The intuition behind this 
is that ioco allows underspecification of input actions. In this section we present a 
function that transforms LTS’s into lOTS’s in a way that complies with this notion of 
underspecification. We will s how that this leads to a new implementation relation that 
is slightly weaker than ioco. 

Underspecification comes in two flavors: underspecification of input actions and 
underspecification of output actions. Underspecification of output actions is always ex- 
plicit; in an LTS it is represented by a choice between several output actions. The intu- 
ition behind this is that we do not know or care which of the output actions is imple- 
mented, as long as at least one is. Underspecification of input actions is always implicit; 
it is represented by absence of the respective input action in the LTS. The intuition be- 
hind underspecification of input actions is that after an unspecified input action we do 
not know or care what the behavior of the specified system is. This means that in an 
underspecified state — i.e., a state reached after an unspecified input action — every 
action from the label set is correct, including quiescence. Following [13] we call this 
kind of behavior chaotic. 



96 



Machiel van der Bijl, Arend Rensink, and Jan Tretmans 



In translating LTS’s to lOTS’s, we propose to model underspecification of input ac- 
tions explicitly. Firstly, we model chaotic behavior through a state with the property: 
VA € [/ : and \/X G I : > q^ (where x stands for chaos). Secondly, 

we add for every stable state q (of a given LTS) that is underspecified for an input a, a 
transition {q, a, q^). This turns the LTS into an lOTS. After [10] we call this procedure 
demonic completion — as opposed to angelic completion, where unspecified inputs 
are discarded (modeled by adding self-loop transitions). Note that demonic completion 
results in an lOTS that is not strongly convergent. However the constraint of strong 
convergence only holds for LTS’s. 

Definition 6. 

S' : CTS{I,U) IOTS{I,U) is defined by (Q, /, f/, T, go) ^ (Q', /, C/, T', go), 
where 

Q' = Q\J {g^, gi 7 , g^}, where q^, qn, qA ^ Q 
T' = TU{(g, a, g^) | g e Q, a e /, g , g } 

U{(gx.L9r3), (gx^L9A)} U {(g^, A,g^) | A G L}U {{qA,X,q^) | A G /} 



specification ( LTS } demonically completed specification ( lOTS ) 





Fig. 6. Demonic completion of an LTS specification 



Example 3. To illustrate the demonic completion of implicit underspecification, we use 
the money component of section 3.1. The LTS specification of the money component 
is given in the top left corner of Figure 6. The lOTS that models our chaos property 
is given in the bottom left corner. For every stable state of the specification that is 
underspecified for an input action, the function S adds a transition with that input action 
to state g^. For example, every state is underspecified for input action error, so we add 
a transition from every state to g^ for error. The states gi and g 2 are underspecified for 
0.50 and 1.00, so we add transitions for these inputs from gi and g 2 to g^. The resulting 
demonically completed specification is given on the right hand side of Figure 6. 
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An important property of demonic completion is that it only adds transitions from 
stable states with underspecified inputs in the original LTS to q^. Moreover, it does not 
delete states or transitions. Furthermore, the chaotic lOTS acts as a kind of sink: once 
one of the added states {q^, qq or q/^) has been reached, they will never be left anymore. 

Proposition 2. Let s G CTS{I, U). Vct g Lg,q' G Qs ■■ s q' S{s) q' 

We use the notation “ioco o S'” to denote that before applying ioco, the LTS spec- 
ification is transformed to an lOTS by S; i.e., i(ioco o S)s z ioco S"(s). This rela- 
tion is slightly weaker than ioco. This means that previously conformant implementa- 
tions are still conformant, but it might be that previously non-allowed implementations 
are allowed with this new notion of conformance. 

Theorem 3. ioco C ioco o S 

Note that the opposite is not true i.e., i (ioco o S) s i ioco s (as the counter- 
examples of section 4 show). Furthermore this property is a consequence of our choice 
of the demonic completion function. Other forms of completion, such as angelic com- 
pletion, result in variants of ioco which are incomparable to the original relation. 
Testing. The testing scenario is now such that an integrated system can be tested by 
comparing the individual components to their demonically completed specifications. If 
the components conform, then the composition of implementations also conforms to 
the composition of the demonically completed specifications. 

Corollary 1. Let si, S 2 G CTS{I, U) and ii,i 2 G LOTS {I, U) 

ii ioco S'(si) A i 2 ioco S{s 2 ) ^ ii || *2 ioco S'(si) |j S'(s2) 

Test Restriction. A disadvantage of demonic completion is that it destroys informa- 
tion about underspecified behavior. On the basis of the underspecified LTS, one can 
conclude that traces including an unspecified input need not be tested because every 
implementation will always pass; after completion, however, this is no longer visible, 
and so automatic test generation will yield many spurious tests. 

In order to avoid this, we characterize ioco o S' directly over LTS’s. In other words, 
we extend the relation from JOTS x JOTS to JOTS x CTS, in such a way as to 
obtain the same testing power but to avoid these spurious tests. For this purpose, we 
restrict the number of traces after which we test. 

Definition?. Let s G CTS{I,U). 

a 

Utraces{s) =def {cr G LJ | s A {^q' , Ui ■a-a 2 = a : a G I As ==^ q' A q' )} 

Intuitively, the Utraces are the Straces without the underspecified traces. A trace cr 
is underspecified if there exists a prefix cti • a of tr, with a G I, for which s ==^ q' 
and q' . We use iocoy as a shorthand for ioco [/traces- In the following proposi- 
tion we state that iocoy is equivalent to ioco o S. This equivalence is quite intuitive, 
ioco o S uses extra states to handle underspecified behavior, which are constructed so 
as to display chaotic behavior. If T{s) reaches such a state, then all behavior is consid- 
ered correct, iocoy, on the other hand, circumvents underspecified behavior, because 
it uses Utraces. 

Theorem 4. iocoj/ = ioco o S 
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6 Conclusions 

The results of this paper imply that ioco can be used for compositional testing if the 
specifications are modeled as lOTS’s; see theorems 1 and 2. 

We proposed the function E to complete an LTS specification; i.e., transform an 
LTS to an lOTS in a way that captures our notion of underspecification. This means that 
the above results become applicable and the ioco theory with completed specifications 
can be used for compositional testing. The resulting relation is slightly weaker than 
the original ioco relation; previously conformant implementations are still conformant, 
but it might be that previously non-conformant implementations are allowed under the 
modified notion of conformance. 

Testing after completion is in principle (much) more expensive since, due to the 
nature of lOTS’s, even the completion of a finite specification already displays infinite 
testable behavior. As a final result of this paper, we have presented the implementation 
relation iocoy. This relation enables us to use the original component specifications, 
before completion, for compositional testing (see theorem 4). 

The insights gained from these results can be recast in terms of underspecification. 
ioco recognizes two kinds of underspecification: omitting input actions from a state 
(which implies a don ’t care if an input does occur) and including multiple output actions 
from a state (which allows the implementation to choose between them). It turns out that 
the first of these two is not compatible with parallel composition and hiding. 

Testing in Context. We have discussed the pre-congruence properties mainly In the 
context of compositional testing, but the results can easily be transposed to testing in 
context. Suppose an implementation under test i is tested via a context c. The tester 
interacts with c, and c interacts with i; the tester cannot directly interact with i. Then 
we have f C Uc and Ui C and Li is not observable for the tester, i.e., hidden. 
The tester observes the system as an implementation in a context in the following way: 
C\i] = hide n Uc) U (Ic H Ui) in c || i. Now theorem 1 and 2 directly lead to the 
following corollary for testing in context. 

Corollary 2. Let s,i € LOTS occur in test context C[)\. C\i] io;6o C\s] z io/:o s 

Hence, an error detected while testing the implementation in its context is a real 
error of the implementation, but not the other way around: an error in the implemen- 
tation may not be detectable when tested in a context. This holds of course under the 
assumption that the test context is error free. 

Relevance. We have shown a way to handle underspecification of input actions when 
testing communicating components with the ioco theory. This idea is new for LTS test- 
ing. It is inspired by [10] and work done on partial specifications in FSM testing [1 1]. 

Furthermore we have established a pre-congruence result for ioco for parallel com- 
position and hiding. This is important because it shows that ioco is usable for com- 
positional testing and testing in context. It establishes a formal relation between the 
components and the integrated system. As far as we know this result is new for both 
LTS testing and FSM testing. In FSM testing there are so called Communicating FSM’s 
to model the integration of components. However we have not found any relevant re- 
search on the relation between conformance with respect to the CFSM and conformance 
with respect to its component FSM’s. 
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Traditionally conformance testing is seen as the activity of checking the confor- 
mance of a single black box implementation against its specification. The testing of 
communicating components is often considered to be outside the scope of conformance 
testing. The pre-congruence result shows that the ioco theory can handle both problems 
in the same way. 

Future Work. The current state of affairs is not yet completely satisfactory, because the 
notion of composition that we require is not defined on general labeled transition sys- 
tems but just on lOTS’s. Testing against lOTS’s is inferior, in that these models do not 
allow the “input underspecification” discussed above: for that reason, testing against 
an lOTS cannot take advantage of information about “don’t care” inputs (essentially, 
no testing is required after a “don’t care” input, since by definition every behavior is 
allowed). We intend to solve this issue by extending lOTS’s with a predicate that iden- 
tifies our added chaotic states. Testing can stop when the specification has reached a 
chaotic state. 
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Abstract. Observation objectives are behaviours that an implementation under 
test is expected to exhibit during testing. It is desirable to express the objectives 
at a high level of behavioural abstraction. Unfortunately, current specification 
methods do not offer proper expressiveness for this. In this paper we demonstrate 
how observation objectives can be declared when the specification of a system 
consists of a formal abstraction hierarchy. 



1 Introduction 

[13] defines observation objectives as a set of behaviours' that one attempts to produce 
in an implementation under test (iut). These objectives are not directly related to the 
correctness of an iut, but they are sets of interesting behaviours or behaviours likely 
to contain errors. Formally, they are simply sets of behaviours that intersect the set of 
behaviours induced by the specification of the system. 

Observation objectives are created manually in terms of a specification. Unfortu- 
nately, detailed models of complex systems are themselves complex and, therefore, they 
are hard to grasp. The complexity is managed with abstractions. By omitting details from 
the models they become more understandable. This holds also for observation objectives 
- defining them at a more abstract level than the low-level complex specification makes 
them more understandable. 

Creating abstractions for low-level specifications has been a popular research 
topic [11,8]. The idea in abstracting is to reduce the complexity of a concrete specification 
by hiding details. For example, if a specification is a finite state automaton an abstraction 
can be created by grouping several lower level states into abstract states [10,9]. Then 
different concrete level behaviours appear similar at the more abstract level, and one of 
them can be selected as a representative for the whole group of behaviours. 

Proper behavioural abstractions do not appear in specifications accidentally but the 
specifier must have had them in mind when writing the specification. Abstractions in 
current specification methods concentrate on traditional units of modularity, like classes 
or packages, not on behaviours. Behavioural abstractions need to be produced afterwards 
by abstracting low-level specifications. This post-abstracting is not always feasible, 
especially if knowledge of the abstractions is not explicitly stated. 

* This work has been partly funded by Academy of Finland (project 5100005). 

* A behaviour is a possibly infinite sequence of states b = (so, si, S 2 , • • • ), where each state Si 
consists of an assignment of values to the variables of the specification. 

A. Petrenko and A. Ulrich (Eds.): FATES 2003, LNCS 2931, pp. 101-113, 2004. 
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Fig. 1. Mapping of behaviours 

In this paper we demonstrate how observation objectives can be created by utilizing 
a specification method whose specifications consist of abstraction hierarchies [5]. By 
giving specifications in this way post-abstracting is not needed. We define the notion of 
a declarative test case (dtc) to be an expression determining a set of behaviours that the 
lUT is enforced to exhibit during testing. The set is an observation objective. Dtcs can 
be given at any level in the specification hierarchy. 

The formal abstraction hierarchy enables a rigorous mapping from the behaviours 
of a concrete specification to the behaviours of a more abstract one. Utilizing this, it is 
possible to observe a behaviour of the iut at a concrete level and then map the observed 
behaviour to any level in the abstraction hierarchy and check if an observed behaviour 
satisfies a given dtc. Figure 1 illustrates the setting. Rectangles are specifications (c, 
oi, 02 and ma) forming an abstraction hierarchy. Each specification induces a set of 
behaviours depicted inside the rectangles. The two ellipses stand for observation objec- 
tives. In the figure, the only depicted concrete level behavior is mapped to a behaviour in 
both specifications which are refined by the most concrete specification. The behaviour 
satisfies the observation objective given in specification ai but not the objective declared 
in 02 - 

We begin by defining basic notions of observation objectives and declarative test 
cases in Sect. 2. The section discusses how dtcs are observed in behaviours. Section 3 
introduces the specification method DisCo, and Sect. 4 describes how dtcs are created 
in practice. Section 5 concludes the paper. 

2 Declarative Test Cases 

In the following we concentrate on specifications that induce possibly infinite sets of 
infinite sequences of states called behaviours. In [2] a formal basis for testing is presented, 
utilizing joint-action specifications. The testing is justified with a testing hypothesis, 
which states that every implementation under test (iut) has a formal model (miux) that 
is indistinguishable from iut if they are both put in a black box. In the paper a formal 
meaning is assigned for correct and incorrect implementations; an implementation is 
correct if all behaviours that ttIiut induces are legal, and incorrect if illegal behaviours 
exist. Testing is an attempt to produce the latter behaviours. 
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Fig. 2. Venn diagram of behaviours 



An observation objective [ 13 ] is a set of behaviours that one wishes to observe 
during testing. They are defined in terms of a specification. Whether an iut exhibits 
an observation objective or not has nothing to do with correctness, but the two matters 
are orthogonal. Observed behaviours are divided into three categories with respect to an 
observation objective. A legal behaviour satisfying an observation objective is flagged as 
successful, an illegal scenario as failed, and a legal scenario not satisfying the objective 
as inconclusive. 

The Venn diagram in Fig. 2 depicts the different sets of behaviours. The rectangle on 
the right hand side is the universe of behaviours containing all legal and illegal behaviours 
in terms of specification spec. The specification induces a set of legal behaviours depicted 
as a large horizontal ellipse in the middle; the horizontal ellipse on the left-hand side is 
the set of behaviours that the implementation (or its model TOjut to be precise) induces. 
An observation objective is depicted as the small horizontal ellipse. Behaviours and 
b-j are successful, behaviours 63 and 612 are failed, and behaviours b\, b^ and &g are 
inconclusive. 

2.1 Defining DTCs 

Observation objectives are declared with declarative test cases (dtc), which consist of 
propositions over states. A dtc is a quantified sequence of state expressions. 

DTC = 3 oq, oi, • • • , Ok -1 G Objects : 

(Fo(oOj Oi, • • • , Ofe_i), 

Pl(oo, Oi, • • • , Ofe_i), 

* * * 5 

Pn-l{oo,Oi,‘ ■ ■ ,Ofc_i)), 

where Objects is a set of all objects of the specification instance (we assume an object- 
oriented specification method). 

Iut is said to satisfy a dtc if it exhibits a behaviour where, beginning from the initial 
state So, expression po is satisfied (for some binding to the free variables) until pi is 
satisfied (for the same binding), and so on until Pn-i is satisfied. 
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Fig. 3. Mapping from states of behaviour to three tuples 



2.2 Satisfying DTC 

The computational complexity of a decision problem ’does a given behaviour satisfy a 
given DTC’ is hard. Given a behaviour b, solving the decision problem requires binding 
the actual objects to quantified variables and then finding a subsequence of states in the 
behaviour satisfying the dtc. This leads to combinatorial explosion when the number 
of quantified variables and the number of expressions grows. In this subsection we only 
scratch the surface of the complexity issues. 

Let us first consider the case where the quantified objects are bound: 

DTC= {po{do,Oi,- ■ ■ ,6fc-l), 

Pi{bo, di, • • • , 6fc_i), 

* * * 1 

Pn-l{oo,di, ■ ■ ■ ,dfe_i)) 



When validating if a given behaviour (so,si,...) satisfies the dtc, the behaviour is 
mapped to a sequence of n-tuples^ {to, ti, ...) so that each state is mapped to an n-tuple 
(5 q, 61 , • • • , bn-i) of booleans. If pi is satisfied in a state then bi is 1 and 0 otherwise. 
Now, regular expression {lxx...x)'^{xlx...x)'^ ...{xxx...\)'^ (where ’x’ denotes either 0 
or 1 and denotes repetition of one or more times) defines a language on n-tuples. The 
language consists of the sequences of n-tuples which are images of the behaviours that 
satisfy the corresponding dtc. The regular expression simply defines that an arbitrary 
(but not zero) number of states, starting from the initial state, satisfies the first state 
expression of the dtc, until starting from some state the second state expression is 
satisfied, and so on until the last state expression is satisfied. 

Figure 3 illustrates the setting for the case where the dtc consists of three propositions 
Po,Pi,P 2 - Each state Si of the sample behaviour (on the top) is mapped to a three-tuple 
ti of booleans (in the bottom) according to the sketched rules. Propositions po and p 2 are 
satisfied in the initial state sq , but pi is not. Therefore, sq is mapped to tuple (1,0,1). The 
second state, which satisfies only po is mapped to tuple (1,0, 0), and so on. In this case the 
regular expression to be matched to the sequence of the tuples is {Ixx)'^ {xlx)'^ {xxl)'^ . 
It matches for instance to the subsequence to,ti,t 2 ,to, which means that dtc is satisfied 
by the behaviour. The satisfaction can be checked for example with a non-deterministic 
finite state automaton depicted in Fig. 4. 

^ Remember that n is the number of state expressions in the dtc. 
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Fig. 4. Automaton for checking whether a dtc is satisfied 



In the general case we must, in principle, utilize the described method for each 
possible binding of objects oq , oi , • • • , Ok - 1 • In the worst case the number of the needed 
automata is I x {I — 1) x ■ ■ ■ x {I — k + 1), where I is the number of objects and k is 
the number of quantified variables. 

The required computational resources can be reduced by compromising exactness 
of notifying the satisfaction of a dtc. One way of doing this is based on selecting some 
bindings and utilizing only the corresponding automata. The generation of actual test 
cases can then be driven with this knowledge. 

3 The DisCo Method 

Dtcs themselves do not offer aid to the post-abstraction dilemma, but a specification 
method that supports abstraction hierarchies is needed. The specifications must be struc- 
tured into behavioural units - not into traditional units like packages or objects. 

One method that fulfills fhese prerequisites is DisCo method [1], which is used for 
specifying abstraction hierarchies in [5]. The application area of the method is reactive 
and distributed systems. DisCo specifications consist of layered abstraction hierarchies, 
where a more concrete specification is derived by superposition from a more abstract 
one. The method comprises a specification language, whose semantics is given in terms 
of Tla logic [12], a compiler and an animation tool [3]. 

In the following subsections we introduce how abstraction hierarchies are defined 
with DisCo. A simple specification of automatic teller machines (ATM for short) is used 
as a running example. The specification is a simplified version of our solution to the 
specification competition in the Formal Methods ’99 conference [4]. 

3.1 Basics of DisCo 

DisCo specifications consist of class and multi-object joint action [7,6] definitions. 
Classes are sets of object sharing the same structure, and they can be inherited in an 
object-oriented manner. Executing actions is the only way to change the states of objects. 
An action consists of a name, roles in which object can participate, a guard which is 
a boolean valued expression usually referring to the participating objects, and a body 
which consists of parallel assignments for attributes of the participating objects. If a 
guard evaluates to true for some participant combination, the action is said to be enabled. 
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Only enabled actions can be executed.The idea behind a joint-action is to model joint 
behaviour of the participating objects at an abstract level and hide details on how the 
behaviour is implemented. 

Below is an example of a DisCo specification, consisting of classes Account and 
Till. The specihcation is the most abstract view to ATM. Account objects have an 
attribute balance. The specification layer has a global assertion legalBalance 
stating that the balance is always non-negative. The assertion does not hold by construc- 
tion, but - if we are formal - it must be verified. Action wi thdr aw models a withdrawal. 
The action has two roles acc and till, and one parameter amount, modelling the 
withdrawn amount of money; the guard ensures that the participating account has enough 
money; in the body of the action the amount is decremented from the account. Detailed 
description of the action deposit is omitted: 

layer tills is 

class Account is 
balance: integer; 

end; 

class Till is end; 

assert legalBalance is V a: Account :: a. balance > 0; 

action withdraw (acc : Account; till: Till; amount: integer) is 

when amount > 0 A acc. balance > amount do 
acc. balance acc. balance - amount; 

end; 

action deposit (a: Account; amount: integer) is 

when amount > 0 do 

a. balance := a. balance + amount; 

end; 

end; 



3.2 Refinement 

DisCo specifications can be refined by superposition. Among other things, our variant of 
superposition allows adding new attributes to classes, strengthening guards of actions, 
giving totally new actions, and adding new assignments to actions. The new assignments 
given in a refinement are restricted to refer only to newly introduced attributes. This 
implies that safety properties of the specification are preserved. Behaviours of the refined 
specification have a unique image in behaviours of all predecessor specification layers. 
A superposition step can be considered to define feature of the system. 

Below is an example of a simple refinement of specification tills. The specifi- 
cation layer adds aspects related to bank cards to the specification. A totally new class 
Card is given in the layer. The existing class Till is extended with a state machine 
having states noCard and hasCard, the latter containing variable card, which is a 
reference to the card inserted in the till. Relation cardAcc defines that there exists one 
account for each card and an arbitrary number of cards for each account. Newly given 
action inser tCard models inserting a card to a till. Action wi thdr aw is a refinement 
of the action tills .withdraw. The refinement adds a new role card to the action 
and strengthens the guard of the original action with additional conjuncts. The ellipsis 
in the guard denotes the original guard, and in the body it refers to the original body: 
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layer cards is 

import till; 

class Card is end; 

extend Till by 

state: (noCard, hasCard) ; 

extend hasCard by 

card: reference Card; 
end; 
end; 

relation cardAcc (Card, Account) is *:1; 

action insertCard ( t : Till; c: Card) is 
when t . state 'noCard do A 

not (3 t: Till :: t . State ' hasCard . card = c) do 
t. state := hasCard (c) ; 

end; 

refined withdraw (acc : Account; till: Till; amount: integer; card: Card) 
of withdraw(acc, till, amount) is 
when ... till . state 'hasCard A 
cardAcc (card, acc) do 

end; 

action ejectCard(t: Till; c: Card) is 
when t . state 'hasCard do 
t. state := noCardO; 

end; 



end; 



3.3 Composition 

Specifying independent refinements of a common root specification leads to several 
parallel refinements. These parallel specifications (or their refinements) can then be 
conjoined to one composite specification. Then data parts of the component specifications 
are merged and actions that have a common ancestor in their refinement histories are 
conjoined. 

Specification layer customers adds features related to users of the ATM system 
(they have a wallet in which the withdrawn money is inserted). The specification is, 
similarly to layer cards, a refinement of the root specification tills. Figure 5 depicts 
the final abstraction hierarchy of the ATM system. 

A DisCo specification need not fix the number of objects and their initial states. One 
specification can then be instantiated with various initial states and an arbitrary number 
of objects. 

When an abstraction hierarchy has been constructed, a testing engineer can select 
any abstraction in the hierarchy and create dtcs with respect to the selected abstraction. 
This is a natural way to focus testing on selected features of the system. 

4 Producing Declarative Test Cases in Practice 

Dtcs could, in principle, be created manually. However, in real life testing this would be 
too error-prone and the need for tool support is obvious. Fortunately, the DisCo method 
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Fig. 5. The layered specification hierarchy of the ATM system 



comprises an animation tool for animating (even abstract) specifications on a computer 
screen. The tool can be utilized for the task. 

Before animating a specification it needs to be instantiated, which means creation 
of a legal initial state with some number of objects in their local initial states. This is 
carried out by dragging and dropping object on the canvas. After instantiation the tool 
verifies that the initial conditions and assertions are not violated and (if this verification is 
successful) the tool changes to execution mode. Now, actions can be executed manually, 
or the tool can select the next action randomly. A sequence of executed actions can be 
saved as a scenario and rerun afterwards. The states of the specification instance can 
also be saved after any executed action. 



4.1 SeedofDTC 



The behaviours induced by a specification are used for creating dtcs. A behaviour 
which is used in the creation of a dtc is called a seed of me. A seed could be utilized 
as a declarative test case by requiring the iut to exhibit the same sequence of states. 
This would be very restrictive: as the iut corresponds to a specific instance of the 
specification, the seed should he created using a similar instance, which might not be 
known at specification time. Moreover, as such it is a legal behaviour and requiring the 
exhibition of legal behaviours is against the idea of testing. 

Let seed = {sq, Si, S2, ■ ' ' be a sequence of states Si = Og 0 0 

•••0 o^_i, where o* is the local state of object Oj in state s^. In order to cre- 
ate a DTC, the seed is abstracted so that we restrict ourselves only to the ob- 
jects which participated in the executed actions. These objects become existentially 
quantified in dtc. Let objects oq, 01,02 be the objects that participated in the ac- 
tions when the seed was created. The automatically created dtc is dtc(seed) = 
3 o'p, o'l, O2 : (po(og, o'l, 02),pi(0p, o[,02),P2{oq, o'l, O2), • • •p„(op, o'l, O2)), where 

Pi(op, o[,02) states that quantified objects have the same value as “the corresponding” 
objects in state i of the seed: Pi{oQ, o'l, o^ = o[ = oj A O2 = o| A O3 = o|, where o* 
is the local state of object oj in state Si of the seed. The dtc can be abstracted further 
manually, for example, by removing requirements for some uninteresting attributes. 
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Fig. 6. The initial state in the DisCo Animation Tool 



DisCo Animation Tool is utilized in creating dtcs. The specifier 

1 . selects any specification layer, 

2. instantiates it, 

3. executes a sequence of actions with the tool and 

4. saves the initial state and the state after each executed action. This sequence of states 
is a seed for dtc. 

4.2 Example 

As an example we show how dtc is created for the ATM specified in Sect. 3. 

Specification layer cards (in the left hand side of Fig. 5) is selected for the abstrac- 
tion level (1st item in the enumerated list in Subsection 4.1). 

The instance (2nd item) of the specification consists of an account ac c whose balance 
is 100, one card ca, and two tills tilll and till2. Cardl is related to account accl. 
The initial state has been loaded in the Disco Animation Tool in Fig. 6. The canvas on 
the top right-hand corner shows the objects and the relations. Actions are depicted in the 
top left-hand corner; enabled actions (insertCard and deposit) are highlighted in 
green. 

The scenario (items 3 and 4) in this example is a basic case where a card is inserted 
to a till, money is withdrawn and the card is ejected: 

{insertCard{tilll, cardl) 
withdraw{accl, tilll, 100, cardl) 
eject{tilll, cardl)) 

Running the scenario and saving the states after each action execution (and the initial 
state) with the animator leads to the following seed: 
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/ acc(lOOO) \ 
/[ ca() 

^ tilll{noCard) 
\till2{noCard)) / 



/ acc( 1000) \ 

ca() 

tilll{hasCard(ca)) 
\till2(noCard)) / 



( acc(900) \ 

ca() 

tilll{hasCard(ca)) 
\till2(noCard)) / 



/ acc(900) \ 

ca() . 

tilll{noCard) ' 
\till2(noCard)) / 



The actual dtc is created by abstracting the seed by quantifying the participants. 
Object till2 is dropped out from the dtc, because it did not participate in any of 
the executed action: DTcjseed) = 3 a G Account] c G Card]t G Till : 
c, f),pi(a, c, f),p 2 (a, c, f),p 3 (a, c, f)), where pi{a,c,t) states that objects a,c 
and t have the same values as objects acc, ca and till\ in the state i in the seed. Proposition 
Po is given as an example: po (a, c,f) = a = acc(1000)Ac= ca{)At = tilll(noCard). 



4.3 Running the Tests 

In order to reduce the needed computational resources the dtc can be instantiated for 
selected assignments to the quantified variables in actual testing. In the ATM example 
a few cards can first be randomly selected. Relation CardAcc, which is a function 
from Card to Account, is used to map the selected cards to the corresponding 
accounts. Tills can be selected randomly. 

The actual testing with all the required scaffolding to the iut is out of scope of 
this paper. It is just assumed that the behaviour of iut is observed in terms of the most 
concrete specification atms. During testing the DisCo Animation Tool executes the 
same actions as the iut. 



4.4 Test Cases 

Specification of an experiment for iut is called a test case. The main component of a 
test case in our setting is a scenario of executed actions at the lowest-level specification 
in the specification hierarchy, which is then used for building the required scaffolding 
to the IUT. 

Test cases are created from abstract declarative test cases manually by augmenting 
abstract scenarios with the required details. In the ATM example augmenting the scenario 
declared in Subsection 4.2 means only adding Customer objects to their roles: 

{insertCard{custl, tilll, cardl) 
withdraw{custl, accl, tilll, 100, cardl) 
eject{custl, tilll, cardl)) 

Currently an automatic test generation is not an option, but we believe that for typical 
cases a script language can be developed to carry out the augmentation. For example, if 
a declarative test case focuses on the charging feature of a telecommunications switch 
by stating that, whenever a call is routed from terminal A to terminal B, the owner of the 
terminal A is charged. Now, a low-level scenario is produced by augmenting the abstract 
scenario with the actions actually taking care of the routing. This is quite simple, because 
in normal cases the routing procedures follow the same pattern. This is left as future 
work. 

Another approach for creating test cases can be based on utilizing DisCo Animation 
Tool and a probabilistic execution mode of actions. In this mode actions are given 
priorities, and more highly prioritized actions are more likely get selected for execution. 




Defining Observation Objectives for Reactive and Distributed Systems 



111 



Now, referring to the telecommunications switch example, the testing engineer first 
divides the actions into two categories: interface actions (i.e., actions initiating a call or 
hanging up the phone) and actions that are internal to the switch (i.e., actions taking care 
of the routing). By giving interface actions lower priority than the internal ones, the calls 
are likely to get routed properly and the aforementioned observation objective becomes 
satisfied. We believe that it is quite typical that if the action modelling disabling some 
started procedure (hanging up the phone) are set lower priority it is possible to create 
test cases utilizing observation objectives. Also in reality many systems spend most of 
their time executing their internal actions and receive stimuli from the environment only 
occasionally. However, using priorities in the creation of test cases is also left as future 
work. 

4.5 Animator as a Validation Tool 

In [2] we defined an abstraction function which maps behaviours of a more concrete 
specification to behaviours of a more abstract one. The mapping is utilized in validating 
the satisfaction of the dtcs. In the ATM example behaviours of specification atms are 
mapped to behaviours of cards. At this level of abstraction the satisfaction of the dtc 
is validated with the animation tool. 

The correctness of behaviours is verified at the most concrete level (specification 
layer atms in the example). The DisCo animation tool verifies that executed actions 
really were enabled and that the states reported by iut are legal. If they are not, an error 
has been discovered. The error need not have anything to do with the dtc being satisfied, 
since the behaviour may be erroneous with respect to other abstractions of the system. 

For example, an implementation of the ATM might exhibit a behaviour where a card 
is inserted to a till, money is withdrawn (but the money does not appear in customer’s 
wallet) and the card is ejected. The behaviour satisfies the dtc defined in Subsection 
4.2 but is erroneous according to abstraction cards, which requires that the customer 
receives the withdrawn money. 

5 Conclusions and Future Work 

In this paper we have proposed a method for defining observation objectives for complex 
software systems whose specifications tend to be complex as well. An observation objec- 
tive is a set of behaviours that an implementation under test is expected to exhibit. They 
are defined by a declarative test case, which is a quantified sequence of state expressions: 

DTC =3 oo, oi, • • • , Ofc_i G Objects: 

(Po(oO,Oi, • • • 

Pl(oo, Oi, • • • , Ok-l), 

* * * 5 

Pn-l(oo, Ol, • • • , Ok-l)) 

A DTC is satisfied by a behaviour whose states satisfy po until pi is satisfied and so on 
until p„_i is satisfied. 
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We demonstrated how objectives of interesting behaviours can be given conveniently 
if the specification of iut consists of an abstraction hierarchy. This circumvents the post- 
abstracting problem present in currently utilized methods. DisCo method was utilized 
in the example. 

The proposed method is formally rigorous but still a practical way for declaring ob- 
servation objectives for testing. The DisCo toolset offers a decent platform for specifying 
complex systems. DisCo’s way of structuring specifications into behavioural units allows 
declaring objectives related features of the system, since each layer can be considered 
to model a feature. 

The proposed method requires instantiating the specification for some number of 
objects in some initial states. If not carried out carefully this can lead to inferior dtcs. 
For example, if the instantiation does not match the actual implementation, it might be the 
case that the required behaviours cannot be run at all by the iut. Every implementation 
corresponds uniquely to some instance of its DisCo specification. This instance can be 
loaded to DisCo Animation Tool and then the testing engineer can verify that the iut 
is able to exhibit the required behaviour. 

The proposed method does not necessarily lead to an exhaustive test suite with 
respect to traditional coverage methods. The testing engineer is allowed to create any 
collection of dtcs, that she wishes. In this paper we have used a specification method 
whose state space can be infinite, but if some finite state method is utilized, also the 
traditional coverage measures are available. The state expressions of dtcs are created in 
such a way that state coverage, for instance, can be managed. 

5.1 Future Work 

Augmenting the tools with testing facilities is work to be done. There is no formal 
mapping from DisCo specifications to actual implementations, but they are still made 
manually. Therefore, some guidelines are needed how an implementation is created, 
and especially when it comes to testing, how an implementation is observed properly. 
Currently the observations of an implementation are based on decisions made by the 
engineer who implements the system. Automatic code generation is, of course, a branch 
for future studies. 

In Subsection 4.3 we sketched few a ways for creating actual test cases based on 
observation objectives. They can be produced manually by augmenting an abstract be- 
haviour to a more concrete one, by a script language, which takes care of the augmen- 
tations, or by utilizing a probabilistic execution mode of actions, which produces the 
wanted behaviour with some likelihood. These aspects need more research before we 
can implement them in the toolset. 
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Abstract. Testing is the primary software validation technique used by industry 
today, but remains ad hoc, error prone, and very expensive. A promising improve- 
ment is to automatically generate test cases from formal models of the system 
under test. 

We demonstrate how to automatically generate real-time conformance test cases 
from timed automata specifications. Specifically we demonstrate how to efficiently 
generate real-time test cases with optimal execution time i.e test cases that are the 
fastest possible to execute. Our technique allows time optimal test cases to be 
generated using manually formulated test purposes or generated automatically 
from various coverage criteria of the model. 



1 Introduction 

Testing is the execution of the system under test in a controlled environment following 
a prescribed procedure with the goal of measuring one or more quality characteristics 
of a product, such as functionality or performance. Testing is the primary software 
validation technique used by industry today. However, despite the importance and the 
many resources and man-hours invested by industry (about 30% to 50% of development 
effort), testing remains quite ad hoc and error prone. 

We focus on conformance testing i.e., checking by means of execution whether the 
behavior of some black box implementation conforms to that of its specification, and 
moreover doing this within minimum time. A promising approach to improving the 
effectiveness of testing is to base test generation on an abstract formal model of the 
system under test (SUT) and use a test generation tool to (automatically or user guided) 
generate and execute test cases. Model based test generation has been under scientific 
sfudy for some time, and practically applicable test tools are emerging [4,16,18,10]. 
However, little is still known in the context of real-time systems. 

An important principal problem in generating real-time test cases is to compute 
when to stimulate the system and expect response, and to compute the associated correct 
verdict. This usually requires (symbolic) analysis of the model which in turn may lead 
to the state explosion problem. Another problem is how to select a very limited set of 
test cases to be executed from the extreme large number (usually infinitely many) of 
potential ones. 
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This paper demonstrates how it is possible to generate time-optimal test cases and 
test suites, i.e. test cases and suites that are guaranteed to take the least possible time 
to execute. The required behavior is specified using a deterministic and output urgent 
class of Uppaal style timed automata. The Uppaal model checking tool implements 
a set of efficient data-structures and algorithms for symbolic reachability analysis of 
timed automata. We then use the fastest diagnostic trace facility of the Uppaal tool to 
generate time optimal test sequences. Test cases can either be selected through manually 
formulated test purposes or automatically from three natural coverage criteria — such as 
transition or location coverage-of the timed automata model. 

Time optimal test suites are interesting for several reasons. First, reducing the total 
execution time of a test suite allows more behavior to be tested in the (limited) time 
allocated to testing. Second, it is generally desirable that regression testing can be exe- 
cuted as quickly as possible to improve the turn around time between software revisions. 
Third, it is essential for product instance testing that a thorough test can be performed 
without testing becoming the bottleneck, i.e., the test suite can be applied to all products 
coming of an assembly line. Finally, in the context of testing of real-time systems, we 
hypothesize that the fastest test case that drives the SUT to some state, also has a high 
likelihood of detecting errors, because this is a stressful situation for the SUT to handle. 

The rest of the paper is organized as follows: Section 2 discusses related work, and 
Section 3 introduces our framework for testing real-time systems based on a testable 
subclass of timed automata. Section 4 and 5 describe how to encode test purposes and 
test criteria, and report experimental results respectively. Section 6 concludes the paper. 

2 Related Work 

Relatively few proposals exist that deal explicitly and systematically with testing real- 
time properties [11,9,6,17,8,5,7,14,15]. In [5,8,17] test sequences are generated from a 
timed automata (TA) by applying variations of finite state machine (FSM) checking se- 
quence techniques (see eg. [13]) to a discretization of the state space. Experience shows 
that this approach suffers seriously from the state explosion problem and resulting large 
number of test sequences. The work in [9] and [11] also use checking sequences, but is 
based on different structures and state verification methods. Both assume determinism, 
but not output urgency. To distinguish sequences that can always be executed to comple- 
tion independent on output timing and sequences that may be executed to completion, 
[9] defines may- and must-traceability of transition sequences in a TA. The unique 10 
sequence (UIOv) method is then applied to a FSM derived from the TA by simply re- 
moving the clock conditions on transitions. The sequences are then checked for their 
may- and must-traceability, and the procedure is re-iterated when necessary. This may 
result in many iterations and in incomplete test-suites. The work in [11] assumes a fur- 
ther restricted TA model where all transitions with the same observable action resets the 
same set of clocks. The TA is first translated into a (larger) alternative automaton where 
clock constraints are represented as set-timer and expire-timer events. Based on this, the 
generalized Wp method is used to compute checking sequences. 

In most FSM based approaches, tests are selected based on a fault-model identifying 
implementation faults that is desired to be (or can be) detected during testing. Little 
or no evidence is given to support that the real-time fault models correspond to faults 
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that occur frequently in practice. Another problem is the required assumptions about 
the number of states in the SUT, which in general is difficult to estimate. The coverage 
approach guarantees that the test suite is derived systematically and that it provides a 
certain level of thoroughness, which is important in industrial practice. It is important 
to stress that this is a practically founded heuristic test selection technique. Similarly, 
when time optimal sequences are generated, this is also a level of test selection, where 
only the fastest to execute are selected. Our goal is not full fault coverage that will in 
principle guarantee that the SUT is correct if it passes all generated tests. 

A different approach to test generation and selection is [6] where a manually stated 
test purpose is used to define the desired sequences to be observed on the SUT. A 
synchronous product of the test purpose and TA model is first formed and used to 
extract a symbolic test sequence with timing constraints that reach a goal state of the test 
purpose. This symbolic trace can be interpreted at execution time to give a final verdict. 
This work does not address test suite optimization or time optimality, does not address 
test generation without an explicit test purpose, and does not appear to be implemented in 
a tool. [15] proposes a fully automatic method for generation of real-time test sequences 
from a subclass of TA called event-recording automata which restricts how clocks are 
reset. The technique is based on symbolic analysis and coverage of a coarse equivalence 
class partitioning of the state space. 

Our work is based on existing efficient and well proven symbolic analysis techniques 
of TA, and unlike others addresses time optimal testing. Most other work on optimizing 
test suites, e.g [1,19,10], focus on minimizing the length of the test suite which is not 
directly linked to the execution time because some events take longer to produce or 
real-time constraints are ignored. Others have used (untimed) model-checking tools to 
produce test suites for various model coverage criteria e.g., [10]. 

The main contributions of the paper are 1) application of time and cost optimal 
reachability analysis algorithms to the context of time-optimal test case generation, 2) 
an automatic technique to generate time optimal covering test suites for three important 
coverage criteria, 3) through creative use of the diagnostic trace facility of Uppaal, 
a test generation tool exists that is based on efficient and well-proven algorithms, and 
finally 4) we provide experimental evidence in fhaf the proposed technique has practical 
merits. 

3 Timed Automata and Testing 

We will assume that both the system under test (SUT) and the environment in which it 
operates are modeled as TA. 

3.1 Timed Automata 

Let AT be a set of non-negative real-valued variables called clocks, and Act = I U 
O U {t} a set of input actions X and output-actions O, (denoted al and a!), and the 
non- synchronizing action (denoted r). Let G(X) denote the set of guards on clocks 
being conjunctions of simple constraints of the form x ixi c, and letU{X) denote the 
set of updates of clocks corresponding to sequences of statements of the form x := c. 
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where x G X, c G N, and cc G {<, <, =, >}^ A timed automaton (TA) over {Act, X) 
is a tuple {L,£q, I, E), where L is a set of locations, £q G L is an initial location, 
I : L Q{X) assigns invariants to locations, and E is a set of edges such that E C 
L X g{X) X Act X U{X) X L. We write £ > £' iff {£, g, a, u, £') G E. 

The semantics of a TA is defined in terms of a timed transition system over states of 
the form p = {£, a), where f is a location and cr G M>g is a clock valuation satisfying 
the invariant of £. Intuitively, there are two kinds of transitions: delay transitions and 
discrete transitions. In delay transitions, {£, a) {£,a + d), the values of all clocks 
of the automaton are incremented with the amount of the delay, d. Discrete transitions 
{£, a) {£', a') correspond to execution of edges {£, g, a, u, £') for which the guard g 
is satisfied by cr. The clock valuation a' of the target state is obtained by modifying cr 
according to updates u. We write p ^ as a short for zip', p ^ p', 7 G Act U M>o- A 
timed trace is a sequence of alternating time delays and actions in Act. 

A network ofTA A\ || • • • || An over {Act, AT) is defined as the parallel composition 
of n TA over {Act, X). Semantically, a network again describes a timed transition system 
obtained from those of the components by requiring synchrony on delay transitions 
and requiring discrete transitions to synchronize on complementary actions (i.e. a? is 
complementary to a!). 

3.2 Uppaal and Time Optimal Reachability Analysis 

Uppaal is a verification tool for a TA based modeling language. Besides dense clocks, 
the tool supports both simple and complex data types like bounded integers and arrays 
as well as synchronization via shared variables and actions. The specification language 
supports safety, liveness, deadlock, and response properties. 

To produce test sequences, we shall make use of Uppaal’s ability to generate diag- 
nostic traces witnessing a submitted safety property. Currently Uppaal supports three 
options for diagnostic trace generation: some trace leading to the goal state, the short- 
est trace with the minimum number of transitions, and fastest trace with the shortest 
accumulated time delay. The underlying algorithm used for finding time-optimal traces 
is a variation of the A* -algorithm [2,12]. Hence, to improve performance it is possible 
to supply a heuristic function estimating the remaining cost from any state to the goal 
state. 

Throughout the paper we use Uppaal syntax to illustrate TA, and the figures are direct 
exports from Uppaal. Initial locations are marked using a double circle. Edges are by 
convention labeled by the triple: guard, action, and assignment in that order. The internal 
T-action is indicated by an absent action-label. Committed locations are indicated by a 
location with an encircled “C”. A committed location must be left immediately as the 
next transition taken by the system. Finally, bold-faced clock conditions placed under 
locations are location invariants. 

3.3 Deterministic, Input Enabled and Output Urgent TA 

To ensure time optimal testability, the following semantic restrictions turn out to be 
sufficient. Following similar restrictions as in [17], we define the notion of deterministic, 

* To simplify the presentation in the rest of the paper, we restrict to guards with non-strict lower 
bounds on clocks. 
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input enabled and output urgent TA, DIEOU-TA, by restricting the underlying timed 
transition system defined by the TA as follows: 

1. Determinism. Two transitions with the same label leads to the same state, i.e., for 
every semantic state p = (i, a) and action 7 € Act U {K>o}, whenever p p' 
and p p" then p' = p” . 

2. (Weak) input enabled. At any time any input action is enabled, i.e., whenever p — ^ 
for some delay d € M>o then \/a G I. p ^ . 

3 . Isolated Outputs. If an output (or r ) is enabled then no other input or output transition 

is enabled, i.e., Va G 0\J {r}. V/3 G {r} whenever p and p then 

a = (j. 

4. Output urgency. When an output (or r) is enabled, it will occur immediately, i.e., 
whenever p a G OU {r} then p d € K>q. 

We assume that the test specification is given as a closed network of TA that can be 
partitioned into one subnetwork modeling the behavior of the SUT, and one modeling 
the behavior of its environment (ENV), see Eigure 1. Often the SUT operates in specific 
environments, and it is only necessary to establish correctness under the (modeled) 
environment assumptions; otherwise the environment model can be replaced with a 
completely unconstrained one that allows all possible interaction sequences. 

We assume that the tester can take the place of the environment and control the SUT 
via a distinguished set observable input and output actions. For the SUT to be testable 
the subnetwork modeling it should be controllable in the sense that it should be possible 
for an environment to drive the subnetwork model through all of its syntactical parts (e.g. 
transitions and locations). We therefore assume that the SUT specification is a DIEOU- 
TA, and that the SUT can be modeled by some unknown DIEOU-TA (this assumption 
is commonly refered to as the testing hypothesis). The environment model need not be 
a DIEOU-TA. 

We use the simple light switch controller in Figure 2 to illustrate the concepts. 
The user interacts with the controller by touching a touch sensitive pad. The light has 
three intensity levels: OFF, DIMMED, and BRIGHT. Depending on the timing between 
successive touches (recorded by the clock x), the controller toggles the light levels. For 
example, in dimmed state, if a second touch is made quickly (before the switching time 
Tgw = 4 time units) after the touch that caused the controller to enter dimmed state 
(from either off or bright state), the controller increases the level to bright. Conversely, 
if the second touch happens after the switching time, the controller switches the light 
off. If the light controller has been in off state for a long time (longer than or equal to 
Tidie = 20), it should reactivate upon a touch by going directly to bright level. We leave 
it to the reader to verify for herself that the conditions of DIEOU-TA are met by the 
model given. 
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z>=Treact 

touch! 

z:=0 




z>=Tpause 

touch! 

t:=l, 

z:=0 

bright? 



(b) 



Fig. 3. Two possible environment models for the simple light switch 



The environment model shown in Figure 3(a) models a user capable of performing 
any sequence of touch actions. When the constant T^eact is set to zero he is arbitrarily 
fast. A more realistic user is only capable of producing touches with a limited rate; this 
can be modeled setting Treact to a non-zero value. Figure 3(6) models a different user 
able to make two quick successive touches (counted by integer variable t), but which 
then is required to pause for some time (to avoid cramp), e.g., Tpause = 5. 

3.4 From Diagnostic Traces to Test Cases 

Let A be the TA network model of the SUT together with its intended environment 
ENV. A diagnostic trace produced by Uppaal for a given reachability question on A 
demonstrates the sequence of moves to be made by each of the system components 
and the required clock constraints needed to reach the targeted location. A (concrete) 
diagnostic trace will have the form: 

(5o,i?o) ^ {SuE^) ^ {S2,E2) ^ •••(5„,£;„) 

where Si, Ei are states of the SUT and ENV, respectively, and 7^ are either time-delays 
or synchronization (or internal) actions. The latter may be further partitioned into purely 
SUT or ENV transitions (hence invisible for the other part) or synchronizing transitions 
between the SUT and the ENV (hence observable for both parties). 
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FAIL 
x<=0 

i_0! 
x:=0 

FAIL 
x<=delay 

x==delay 
o_0? 
x:=0 

PASS 

Fig. 4. Test case automaton for the sequence iq! • delay ■ oq? 

For DIEOU-TA a test sequence is an alternating sequence of concrete delay actions 
and observable actions. From the diagnostic trace above a test sequence. A, may be 
obtained simply by projecting the trace to the ENV-component, while removing invisible 
transitions, and summing adjacent delay actions. Finally, a test case to be executed on 
the real SUT implementation may be obtained from A by the addition of verdicts. 

Adding the verdicts require some comments on the chosen correctness relation be- 
tween the specification and SUT. In this paper we require timed trace inclusion, i.e. that 
the timed traces of the implementation are included in the specification. Thus after any 
input sequence, the implementation is allowed to produce an output only if the specifica- 
tion is also able to produce that output. Similarly, the implementation may delay (thereby 
staying silent) only if the specification also may delay. The test sequences produced by 
our techniques are derived from diagnostic traces, and are thus guaranteed to be included 
in the specification. 

To clarify the construction we may model the test case itself as a TA A\ for the test 
sequence A. Locations in A\ are labeled using two distinguished labels, pass and fail. 
The execution of a test case is now formalized as a parallel composition of the test case 
automaton A\ and SUT As. 

S passes Ax iff Ax \\ As fail 

Ax is constructed such that a complete execution terminates in a fail state if the SUT 
cannot perform A and such that it terminates in a pass state if the SUT can execute all 
actions of A. The construction is illustrated in Figure 4. 

4 Test Generation 

4.1 Single Purpose Test Generation 

A common approach to the generation of test cases is to first manually formulate a set 
of informal test purposes and then to formalize these such that the model can be used 
to generate one or more test cases for each test purpose. A test purpose is a specihc test 
objective (or property) that the tester would like to observe on the SUT. 
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off? dim? bright? off? dim? bright? dim? bright? 




z<Tsw z<Tsw 

touch! touch! touch! 

z:=0 z:=0 z:=0 

Fig. 5. Test Environment for TP2 



Because we use the diagnostic trace facility of a model-checker based on reacha- 
bility analysis, the test purpose must be formulated as a property that can be checked 
by reachability analysis of the combined ENV and SUT model. We propose different 
techniques for this. Sometimes the test purpose can be directly transformed into a simple 
location reachability check. In other cases it may require decoration of the model with 
auxiliary flag variables. Another technique is to replace the environment model with a 
more restricted one that matches the behavior of the test purpose only. 

TPl: Check that the light can become bright. 

TP2: Check that the light switches off after three successive touches. 

TPl can be formulated as a simple reachability property:EoLightController . 
bright (i.e. eventually in some future the lightController automata enters location 
bright). 

Generating the shortest diagnostic trace results in the test sequence: 20 • touchl ■ 0 • 
bright?. However, the/a^^e^^ sequence satisfying the purpose is 0 • touchl ■ 0 • dim? ■ 
0 • touchl ■ 0 • bright?. 

TP2 can be formalized using the restricted environment model^ in Figure 5 with 
the property E<> tpEnv . goal. The fastest test sequence is 0 • touchl ■ 0 • dim? ■ 0 • 
touchl ■ 0 • bright? ■ 0 • touchl ■ 0 • off?. 

4.2 Coverage Based Test Generation 

Often the tester is interested in creating a test suite that ensures that the specification or 
implementation is covered in a certain way. This ensures that a certain level of systemacy 
and thoroughness has been achieved in the test generation process. Here we explain how 
test sequences with guaranteed coverage of the SUT model can be computed using 
reachability analysis, effectively giving automated tool support. In the next subsection, 
we show how to generalize the technique to generate sets of test sequences. 

A large suite of coverage criteria have been proposed in the literature, such as state- 
ment, transition, and definition-use coverage, each with its own merits and application 
domain. We explain how to apply some of these to TA models. 

Edge Coverage: A test sequence satisfies the edge-coverage criterion if, when exe- 
cuted on the model, it traverses every edge of the selected TA-components. Edge coverage 

^ It is possible to use Uppaal’s committed location feature to compose the test purpose and 
environment model in a compositional way. Space limitations prevents us from elaborating on 
this approach. 
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can be formulated as a reachability property in the following way: add an auxiliary vari- 
able Ci of type boolean (initially false) for each edge to be covered (typically realized 
as a bit array in Uppaal), and add to the assignments of each edge i an assignment 
Cj := true; a test suite can be generated by formulating a reachability property requir- 
ing that all Ci variables are true: E<> ( eo==true and e\ ==true ... e„==true ) . 
The auxiliary variables are needed to enable formulation of the coverage criterion as a 
reachability property using the Uppaal property specification language which is a re- 
stricted subset of CTL. 

The light switch in Figure 2 requires a bit-array of 12 elements (one per edge). 
When the environment can touch arbitrarily fast the generated fastest edge covering test 
sequence has the accumulated execution time 28. The solution (there might be more 
traces with the same fastest execution time) generated by Uppaal is: 

EC: 0 • touchl • 0 • diml ■ 0 • touchl ■ 0 • bright? • 0 • touchl ■ 0 • off? ■ 20 • touch\ ■ 0 • 
bright? ■ 4 • touchl ■ 0 • dim? ■ 4 • touchl ■ 0 • off?. 

Location Coverage: A test sequence satisfies the location-coverage criterion if, 
when executed on the model, it visits every location of the selected TA-components. To 
generate test sequences with location coverage, we introduce an auxiliary variable Si of 
type boolean (initially false for all locations except the initial) for each location £i to 
be covered. For every edge with destination £i: I' li add to the assignments u 

Si := true; the reachability property will then require all Si variables to be true. 

Definition-Use Pair Coverage: The definition-use pair criterion is a data-flow cov- 
erage technique where the idea is to cover paths in which a variable is defined (i.e. appears 
in the left-hand side of an assignment) and later is used (i.e. appears in a guard or the 
right-hand side of an assignment). Due to space-limitation, we restrict the presentation 
to clocks, which can be used in guards only. 

We use (v, Cd, e„) to denote a definition-use pair (DU-pair) for variable v if Cd is 
an edge where v is defined and e„ is an edge where v is used. A DU-pair (u, e^, e„) is 
valid if e„ is reachable from and v is not redefined in fhe pafh from fo . A fest 
sequence covers {v, Cd, e„) iff (at least) once in the sequence, there is a valid DU-pair 
{v, €d, e„). A test sequence satisfies fhe (all-uses) DU-pair coverage criterion of v if it 
covers all valid DU-pairs of v. 

To generate test sequences with definition-use pair coverage, we assume that the 
edges of a model are enumerated, so that Cj is the number of edge i. We introduce 
an auxiliary data- variable Vd (initially false) with value domain {false} U (1 . . . |U|| 
to keep track of the edge at which variable v was last defined, and a two-dimensional 
boolean array du of size \E\ x \E\ (initially false) to store the covered pairs. For each 
edge Ci at which v is defined we add Vd ■= Ci, and for each edge cj at which v is used 
we add the conditional assignment if{vd ialse)then du[vd, efi := true. Note that if 
V is both used and defined on the same edge, the array assignment must be made before 
the assignment of Vd- 

The reachability property will then require all du[i,j] representing valid DU-pairs 
to be true for the (all-uses) DU-pair criterion. Note that a test sequence satisfying the 
DU-pair criterion for several variables can be generated using the same encoding, but 
extended with one auxiliary variable and array for each covered variable. 




Time-Optimal Real-Time Test Case Generation Using Uppaal 



123 



4.3 Test Suite Generation 

Often a single covering test sequence cannot be obtained for a given test purpose or 
criterion (e.g. due to dead-ends in the model). To solve this problem, we allow for the 
model (and SUT) to be reset to its initial state, and to continue the test after the reset 
to cover the remaining parts. The generated test will then he interpreted as a test suite 
consisting of a set of test sequences separated hy resets (assumed to he implemented 
correctly in the SUT). 

To introduce resets in the model, we shall allow the user to designate some locations 
as being reset-able. Obviously, performing a reset may take some time that must 
be taken into consideration when generating time optimal test sequences. Reset-able 
locations can be encoded into the model hy adding reset transitions leading hack to the 
initial location. Let Xr be an additional clock used for reset purposes, and let f be a 
reset-able location. Two reset-edges and a new location I' must then be added from I to 
the initial location 1^, i.e., 

, reset\,Xr'- = 0 ./ Xr ——Tr,T,Uo . 

^ ^ ^ 4 

Here uq are the assignment needed to reset clocks and other variables in the model 
(excluding auxiliary variables encoding test purpose or coverage criteria^). If more than 
one component is present in either the SUT-model or environment model, the reset- 
action must be communicated atomically to all of them. This can be done using the 
committed location feature of Uppaal. Further note that it may be possible to obtain 
faster (covering) test suites, if more reset-able locations are added, obviously depending 
on the time required to perform the reset, at the expense of increased model size. 

4.4 Environment Behavior 

A potential problem of the techniques presented above is that the generated test sequences 
may be non-realizable, in that they may require the environment of SUT to operate 
infinitely fast. In general, it is only necessary to establish correctness of SUT under 
the (modeled) environment assumptions. Therefore assumptions about the environment 
can be modeled explicitly and will then be taken into account during test sequence 
generation. In the following, we demonstrate how different environment assumptions 
affect the generated test sequences. 

Consider an environment where the user takes at least 2 time units between each 
touch action; such an environment can be obtained by setting the constant T^eact to 2 in 
Figure 3(a). The fastest test sequences become: 

TPl: 0 • touch] ■ 0 • dirnl ■ 2 • touch] ■ 0 • bright! 

TP2: 0 • touch] ■ 0 • dim! ■ 2 • touch] ■ 0 • bright! ■ 2 • touch] ■ 0 • off!. 

Also reexamine the test suite EC generated by edge coverage, and compare with the one 
of execution time 32 generated when T^eact equals 2: 

EC’: 0 • touch] ■ 0 • dim! ■ 4 • touch] ■ 0 • off! ■ 20 • touch] ■ 0 • bright! ■ 4 • touch] ■ 

0 • dim! ■ 2 • touch] ■ 0 • bright! ■ 2 • touch] ■ 0 • off!. 

^ In the encoding of DU-pair coverage, the variables Vd should be set to false at resets. 
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When the environment is changed to the pausing user (can perform 2 successive 
quick touches after which he is required to pause for some time: reaction time 2, pausing 
time 5), the fastest sequence has execution time 33, and follows a completely different 
strategy, that ensures that one of the additional waiting times Tpause is overlapped with 
a position where the tester needed to wait anyway. 

EC”: 0 • touchl ■ 0 • diml ■ 2 • touchl ■ 0 • hrightl ■ 5 • touchl ■ 0 • diml ■ 4 • touch] ■ 
0 • off! ■ 20 • touchl ■ 0 • hrightl ■ 2 • touchl ■ 0 • offl. 



5 Experiments 

In the previous section we presented techniques to compute time optimal covering test 
suites. In the following we show empirically that the performance of our technique is 
sufficient for practically relevant examples, and to indicate how heuristic search methods 
can he used to compute optimal or near optimal test cases from very large models. We 
are concerned with both the execution time of the generated test sequence, and the time 
and memory used to generate it. 

5.1 The Touch Sensitive Switch 

Most of the experiments reported here are based on a model of a touch sensitive light 
switch (TSS). It has Max levels of brightness (0 corresponds to off). The lamp is operated 
by touching its wire, i.e. the wire can be grasped and released. The behavior of the 
controller can be expressed as follows: If the light is on, then a single grasp and release 
of the wire, will switch off the light. If the light is off, then a single grasp and release 
will switch on the light at the previous brightness level. Continuous holding of the 
wire increases the brightness (resp. decreases) if it was previously decreasing (resp. 
increasing). Once the maximum (resp. minimum) level is reached the brightness level 
decrease (resp. increase). 

In reality a user can only perform two actions on the wire: grasp and release, 
and the time-separation between the two events is translated into either nothing (if the 
separation is very short), touch if it is short, and into a s tarthold and endholdpair 
if the separation is long. In the UppAAL-model this translation is done by the interface 
component, shown in Figure 6(a). The dimmer component shown in Figure 7 reacts 
to s tarthold and endhold actions with a dimming effect. When changing the 
brightness level L, it is assumed that some maximum time (Delay) will elapse between 
two levels. The switch component shown in Figure 6(5) reacts to touch events by 
switching the light on to the previous light level OL, or off. The user is modeled in 
Figure 6(c). 

We vary the model in two ways. First, the user may be patient or impatient. The 
impatient user insists on requiring interaction at least every Wait = 15 time units 
controlled by the invariant in user - this makes it harder for the user to change the 
intensity because he “gives up” the hold after just increasing the light one level. This 
invariant is removed in the patient user. Secondly, we vary the number of light levels 
from Max = 10 and up. 
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on==l 

touch? 

OL:=L, 

L:=0, 

on:=0 




Fig. 6. Interface Automaton (a), Switch Automaton (b), and User Automaton (c) 




Table 8 shows the optimal execution times (in time units) for test suites generated 
from different coverage criteria of the TSS, or selected subsets of components thereof, 
and the length (number of transitions) of the generated test suite. We notice that the 
patient user results in shorter and faster traces in our experiments. 

5.2 System Size and Environment Behavior 

To see how our technique scales, we increase the number of light levels in the TSS model. 
The result, listed in Table 9, shows that the particular example scales well: execution 
time (in time units), generation time, and memory usage for the impatient user increase 
essentially linearly with the number of light-levels. This is not surprising as the system 
size is varied by adjusting a counter, and not the number of parallel components. 

It is more interesting to compare the patient and impatient user. Consider the system 
with 50 light levels. The optimal execution time for the impatient user is high (1183 time 
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Table 8. Optimal execution time and suite length for various coverage criteria 





Impatient 


Patient I 


Coverage 


Execution 


Suite 


Execution 


Suite 


Criterion 


time 


length 


time 


length 


Locationo/mmer 


20 


12 


20 


12 


LoCatiori£)j>nme/: Switch, Interface 


25 


17 


25 


17 




253 


176 


53 


38 


^^^^Interface 


15 


14 


15 


14 


EdgC£)/;n/ner, Switch, Interface 


263 


188 


63 


50 


EdgC/nie/yflce Locatlon^/mmer 


25 


19 


25 


19 


Def-Uscon 


40 


34 


40 


34 


Def-UseoL 


45 


34 


45 


34 



Table 9. Cost of obtaining edge coverage of the TSS with increasing light levels 





Impatient 


Patient 




Execution 


Generation 


Memory 


Execution 


Generation 


Memory 


Levels 


time 


time(s) 


usage (MB) 


time 


time (s) 


usage(MB) 


10 


263 


2.06 


9.1 


63 


3.19 


10.1 


20 


493 


3.68 


11.4 


93 


12.40 


20.1 


30 


723 


5.29 


12.6 


123 


28.17 


40.4 


50 


1183 


8.59 


17.4 


183 


78.30 


86.9 


100 


2333 


16.76 


28.0 


333 


339.52 


314.9 


200 


4633 


34.45 


44.3 


633 


1494.35 


1233.8 


400 


9233 


66.03 


77.1 


N/A 


>7000 


>4180.6 



units), the reason being that the light level is increased only by one before he gives up, 
and starts the hold action again. Obtaining coverage therefore requires many interactions 
(trace of length 828). In contrast, the optimal execution time for the patient user is 183 
time units (and the trace length is 130). If we compare the generation time, it can be seen 
that it is much cheaper to compute the (very long) optimal solution for the impatient 
user than to compute the (short) optimal solution for the patient user. 

Although this is surprising, there is a potential general explanation for this. The 
patient user environment poses no restrictions on the solution, and the test generator 
has complete freedom to find the optimal solution. This means that test generator has to 
evaluate all possible behaviors of this liberal environment. The impatient user is a more 
restricted environment, thus containing less possible behaviors. Therefore, searching the 
more liberal environment takes longer but also produces faster solutions. 

There are two lessons to be learned. First, the relevance of an accurate model of the 
environment assumptions. Secondly, the use of the environment model to control test 
generation: restrict the environment to handle larger systems, but at the cost of more 
expensive solutions. 

We have also created a DIEOU-TA version of the Philips audio control protocol [3] 
frequently studied in the context of model checking. The system consists of a sender and 
a receiver communicating over a shared bus. The sender inputs a sequence of bits to be 
transmitted, Manchester encodes them, and transmits them as high and low voltage on 






Time-Optimal Real-Time Test Case Generation Using Uppaal 



127 



Table 10. Results for the Philips audio protocol 



Coverage 

Criterion 


Execution 
time (/is) 


Generation 
time (s) 


Memory 
usage (KB) 


^^S^Sender 


212350 


2.2 


9416 


EdgC^eceiver 


18981 


1.2 


4984 


^(^^^Sender, Bus, Receiver 


1 14227 


129.0 


331408 



Table 11. Cost of edge coverage of TSS (Max=30) using different search orders 





First Solution 


1 Optimal Solution 


Search 


Execution 


Generation 


Memory 


Generation 


Memory 


order 


time 


time (s) 


usage (MB) 


time 


usage (MB) 


BF 


123 


27.91 


40.8 


N/A 


N/A 


DF 


791 


0.15 


4.9 


N/A 


N/A 


C_BF 


123 


30.44 


42.6 


31.31 


43.3 


C_DF 


791 


0.15 


6.5 


248.64 


127.0 


C BF R 


123 


30.70 


42.6 


30.87 


42.9 


C_DF_R 


791 


0.15 


6.4 


21.62 


32.1 


C_MC 


123 


25.87 


39.3 


26.19 


39.5 


C_MC_R 


123 


3.23 


13.0 


3.32 


13.1 



the bus. Further, it checks for collisions by checking that the bus is indeed low when it 
is itself sending a low signal. The receiver is triggered by low-to-high transitions on the 
bus, and decodes the bits based on this information. 

Table 10 summarizes the results. The first row contains results for the protocol tested 
with an environment consisting of a bus that may spontaneously go high to emulate a 
collision, and a sender buffer producing any legal input-bit sequence. The second row 
shows results for a receiver tested in an environment consisting of a bus, and a buffer to 
hold the received bits. The third row is the results for the receiver tested in an environment 
consisting of a sender component with sender buffer, a bus, and receiver buffer. Thus 
the last row represents a rather large system. In all cases the time optimal covering test 
sequence could be computed in reasonable time. 

5.3 Search-Order and Guiding 

Uppaal allows the state space to be traversed in several different orders with different 
performance characteristics w.r.t. execution time of the generated test suite and the size 
of the system that can be handled. In particular, the A* algorithm has potential significant 
impact. We here demonstrate how it can be employed for test generation to efficiently 
compute edge coverage in the TSS model. 

The measured numbers are listed in Table 11. BF (DF) denotes breadth-first (depth- 
first) search order. The optimal execution time remains identical at 123 time units for 
all search orders. We note that using depth-first search during time optimal analysis 
(C JTF) Uppaal produces (many) solutions quickly, but consumes long time to ultimately 
find fhe optimal one. During time optimal reachability analysis Uppaal (symbolically) 
computes for each reached state the time C accumulated so far. Let Cg be the fastest 
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time to a goal state found so far. When another state is found during exploration with 
an accumulated time C > Cg further exploration from that state is unnecessary, and the 
search can be pruned. Minimum accumulated time-first (MC) explores states ordered 
by their minimum accumulated time. To increase the efficiency further, it is possible to 
provide a safe estimate of the time that remains R from a given state to the goal state. 
Pruning can then be performed when a state is found with C + R > Cg. In Table 11a 
search order combined with a remaining estimate is suffixed by an “JR”. 

It is easy to see in the dimmer component that the most time consuming edge to 
reach is the edge with guard L = Max. As estimate of remaining time, we use {Max — 
L) X delay if level Max = L has not been reached, and 0 otherwise. Intuitively, the 
remaining time equals at least the number of light levels from Max value times the time 
to increase the light one level (delay). This formula has the feature that it can prune 
searches that turns back to lower light levels. 

Compared to C-BF minimum accumulated time first search (C-MC) offers slightly 
improved generation time and memory usage. However, enabling remaining time esti- 
mate combined with this search order (C .MC JR) has a dramatic positive effect, and 
outperforms any of the other evaluated search orders. 



6 Conclusions and Future Work 

In this paper, we have presented a new technique for generating timed test sequences for 
a restricted class of timed automata. It is able to generate time optimal test sequences 
from either a single test purpose or a coverage criterion using the time optimal reacha- 
bility feature of Uppaal. Though a number of examples we have demonstrated how our 
technique works and performs. We conclude that it can generate practically relevant test 
sequences for practically relevant sized systems. However, we have also found a number 
of areas where our technique can be improved. 

The DIEOU-TA model is quite restrictive, and a generalization will benefit many 
real-time systems. We are working on loosening the output urgency requirement. It may 
also be interesting to formulate coverage criteria that considers clock constraints. 

Adding the required annotations for various coverage criteria by hand, and manu- 
ally formulating the associated reachability property is tedious and error prone. We are 
working on a tool that performs these tasks automatically. Finally, we have found that the 
bit- vector annotations for tracking coverage and remaining time estimates may increase 
the state space significantly, and consequently also generation time and memory. The 
extra bits does not influence model behavior, and should therefore be treated differently 
in the verification engine. We are working on techniques that ignore these bits when 
possible, and that takes advantage of them to prune states with “less” coverage. 
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Abstract. We study the generation of test cases for nondeterministic 
real-time systems. We define a class of Determinizable Timed Automata 
(DTA), in order to specify the system under test. The principle of our 
test method consists of two steps. In Step 1, we express the problem in 
a non-real-time form, by transforming a DTA into an equivalent finite 
state automaton. The latter uses two additional types of events, Set and 
Exp. In Step 2, we adapt a non-real-time test generation method. 

Keywords: Real-time systems. Conformance test cases generation, De- 
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1 Introduction 

(Conformance) testing aims to check whether an implementation conforms to a 
specification. Testing activity is realized by: 1) synthesizing test cases from the 
specification, and 2) executing them on the implementation under test (lUT). 
We study especially the synthesis phase, but we also propose a test architecture 
for the execution phase. We consider the case of real-time systems, where the 
specification describes order and timing constraints of the interactions between 
lUT and its environment. For developing a rigorous test method, the speci- 
fication of lUT must be described in a formal way, e.g. by Timed automata 
(TA) [1]. We define a class of determinizable TA (DTA), in order to describe the 
specification of lUT. The principle of our test method is as follows: 

— First, we express the problem into a non-real-time form, by using a method 
that transforms a DTA into a finite-state-automaton (FSA), called Set-Exp- 
Automaton (SEA) [2]. The latter uses two additional types of events: Set 
and Exp. Intuitively, a Set (resp. Exp) event corresponds to the program- 
ming (resp. occurrence) of an alarm. Let SetExp denote the transformation 
operation, and SetExp(A) denote the SEA obtained by transformation of a 
DTA A. A and SetExp(A) are equivalent in the sense that they specify the 
same order and timing constraints of events. 

— Then, we use and adapt a non-real-time method of test generation, called 
Test Generation with Verification technology (TGV) [3]^. 

* Visiting at IRISA from Sept. 2002 to Sept. 2003 

^ Actually, TGV is a software tool. But here, TGV denotes the theoretical method 
that underlies the tool. 
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The test problem we have to solve is to synthesize test cases that allow (by 
executing them) to check the conformance of a real-time lUT to its specification. 
The rest of the paper is organized as follows. Sect. 2 describes the DTA model 
used to describe the specification of lUT. In Sect. 3, we define the test problem 
to be solved. Sect. 4 presents the SEA model and “SetExp: DTA >->• S'EA” used 
by the test method of Sect. 6. In Sect. 5, we propose a test architecture, and 
present a proposition that allows to transform the test problem into a non-real- 
time form. Sect. 6 presents a method that solves the test problem. And in Sect. 7, 
we conclude the paper. 

2 Determinizable Timed Automata (DTA) 

Let us present the DTA model used to describe lUT and its specification. 

2.1 A Few Concepts Used in DTA 

Nondeterminism: Nondeterministic systems are systems whose current state 
is not necessarily determined by observing their execution. In our study, the 
specification of lUT can be nondeterministic. 

Internal actions are actions that are unobservable by the environment of lUT. 
Note that such unobservability may cause nondeterminism. In our study, the 
specification of lUT can contain internal actions. 

Forced transitions: In a real-time system, a first objective of timing con- 
straints is to define intervals of time where an action can occur. The notion 
of forced transition allows to define intervals of time where an action must 
occur. 

Quiescence: Basically, conformance testing allows to check that lUT executes 
only what is permitted by its specification. The notion of quiescence has 
been added to specify when lUT is allowed to stop its execution, and thus, 
to become quiescent. 

2.2 A Class of Determinizable Timed Automata (DTA) 

A clock Ci is a real variable whose value can be reset with the occurrence of an 
event and such that, between two resets, its derivative (w.r.t. time) is equal 
to 1. Let C = {ci, • • • , cjv_,} be a set of clocks. 

A Clock Condition (CC) consists of one or several formulas in the form 
“ci ~ fc” where Ci is a clock, {<,>,<,>,=} and fc > 0. Let T>c de- 
note the set of CCs depending of clocks of C (we consider that True G d>c)- 
A clock reset is any subset of C, and 2^ denotes the set of clock resets. 



Syntax of DTA. A DTA is defined by (£, S,C, T, Iq), where: £ is a finite set of 
locations, Iq is the initial location, C is a finite set of clocks, and A is a finite set 
of events. There are three types of events: the reception of an input i (written 
li), the sending of an output o (written !o), and the occurrence of an internal 
action a (written Ca). T is a set of transitions, and a transition of DTA is defined 
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by Tr = (< 7 ; a; r; CC; Zg-; x), where: q and r are origin and destination locations, 
a is an event of, CC is a clock condition, is a clock reset, and x G {/, uf}. 
Tr is said forced (resp. unforced) if x = f (resp. x = uf) (see semantics of DTA 
for more details). 

The only restrictions used to define DTA from TA are: 1) the index a in Z^ 
means that all the transitions (from any locations) that have the same event, 
have also the same clock reset; and 2 ) Z^ is empty when cr is an internal action. 



Semantics of DTA. Let us define the semantics of a DTA A = (£, S,C,T, Iq) 
by the set of timed traces accepted by A. The term “sequence of transitions 
Trj Tr 2 ■ ■ ■ Tri • • • of A” means: Iq is the origin location of Trj , and the desti- 
nation location of Tri is the origin location of TVi+r . We consider that events 
are instantaneous, i.e., their duration is negligible and can be assumed equal to 
zero. 

Notation 1 Let X he a set of sequences and Xf (C X) he the set of finite 
sequences of X. X denotes the set 0 / finite prefixes of X. Note that Xf C X. 

Enab led/ disabled transition: a transition Tr ={q; a; r; CC; Z„; x) is said en- 
abled when q is the current location and CC evaluates to True. Tr is said 
disabled when it is not enabled. 

Execution of event: for every Tr ={q; cr; r; CC; Z„; x) of A, the event a is 
executed only when Tr is enabled; and after the execution of a, location r is 
reached and the clocks in Z„ are reset. 

A timed sequence is a (finite or infinite) sequence “(ei, ri) • • • (e^, r*) • • •”, 
where are events, each is the time of occurrence of e^, and 
0 < Ti < • • • < Ti < • • •. 

A timed trace is obtained from a timed sequence by removing all the timed 
internal events, i.e., the {ei,Ti) such that et is an internal event. 

At initial time tq = 0, A is at location Iq with all clocks equal to 0. 
Acceptance of the empty sequence Ag: A accepts Aq iff every outgoing tran- 
sition of Iq is unforced, or disabled when all clocks are equal with each other. 
Intuitively, acceptance of Ag by A means that A can be “totally quiescent”. 
Acceptance of a finite timed sequence A„ = (ei, ti) • • • (e„, r^), for Ci, • • • , 
e„ G X. Let Xi be the prefix of A„ of length z, for 1 < i < n. A„ is accepted 
by A iff there exists a sequence of transitions Trj Tr^ • • • Tr„ of A s.t.: 

1. Vz G {l,---,n}: the event of Tri is ep, after the execution of Ai_i, Tri 
is enabled at time rp and no forced transition is enabled at a time r G 
]Ti-i,Ti[ and disabled at r^. Intuitively, A can execute A„. 

2. V forced transition Tr and Vu > r„ s.t. Tr is enabled at time u after the 
execution of A„, we have: Vt > u, Tr is enabled at time r. 

Intuitively, A can be “quiescent forever” after the execution of A„. 

Acceptance of an infinite timed sequence Aqo is defined from acceptance 
of finite timed sequence by: removing Item 2 and replacing n by 00 . Intu- 
itively, A can execute Aoo. 
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Acceptance of a timed trace /i = (ci, ri)(e 2 , T 2 ) ■■■■ fi is accepted by A iff /x 
is obtained by removing all the timed internal actions of a timed sequence 
accepted by A. Intuitively, A can have an execution observed as /x. 

The timed observable language of A {TOL^"^^) is the set of timed traces 
accepted by A. That is, models the observable behaviour of A. 

The timed quiescence observable language of A {TQOL^"^^) is obtained 
from by keeping only the finite timed traces. Intuitively, after the 

observation of /x G TQOL^'^^, A accepts to be quiescent forever, i.e., we may 
wait forever without observing any event. Formally: 

TOL^'^^ (see Notation 1). 

Hypothesis 1 We assume that infinite timed sequences (and traces) accepted 
by A are non-zeno, i.e., have an infinite duration, because zeno executions do 
not correspond to concrete systems. 

In figures, locations are represented by nodes, a transition Tr = 

{q; a; r; CC; {ci,Cj, ■ ■ •}; x) is represented by an arrow linking q to r and labelled 
by {a;CC]{ci,Cj,---}), and the absence of CC or of clock reset is indicated 
by X is not represented in figures; when relevant, we mention whether a 
given transition is forced or not. Let us consider the DTA of Fig. la. We do not 
indicate the types of events (i.e., input, output, internal action) because they 
are irrelevant for the comprehension of the example. Let Su,v denote the delay 
between events u and v. 

— The DTA is initially in location Iq. It reaches li at the occurrence of a. 

— From li, the DTA reaches I2 or I3 at the occurrence of /3. I2 (resp. I3) is 
reached only if Sa^p < 3 (resp. Sa^p > 2). We see that there is a nondeter- 
minism when 2 < Sa^p < 3. 

— From I2, the DTA reaches Iq at the occurrence of 7 . We have Sp^.y > 1 and 
Sa.j < 2 . 

— From I3, the DTA reaches Iq at the occurrence of p. We have Sp^p > 1 and 
Sc,p > 2. 

To clarify the notion of forced transition, let us consider the case where location I2 
is reached while ci > 1 and C 2 < 2, i.e., 7 is enabled as soon as I2 is reached. 
Then 7 : either 1) will occur before C 2 = 2, or 2) will never occur. Case 2 is 
impossible if 7 is in a forced transition. 

The DTA of Fig. la is nondeterministic, and its equivalent deterministic DTA 
is in Fig. lb. 

3 Test Problem to Be Solved 

3.1 Conformance Relation between DTA, Some Related Lemmas 

In the following: I and S are two DTAs over the same alphabet S, and o is 
an output of S. The following conformance relation is an adaptation of ioco of 
[4,3] to the real-time case. 
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Fig. 1. Example: (a) nondeterministic DTA, and (b) equivalent deterministic DTA 



Definition 1 I \ocodta S means, VA € 

1) (A-(o,r) e Tmf™) ^ (A-(o,r) € TOLf™), and 

2) (A € TQOLf^^) ^ (A e TQOL^^^). 

Let lUT be modelled by I. The intuition of “I ioco^jjyi 5” is that after an 
execution of lUT accepted by 5: 1) lUT can generate an output o at time r 
only if S accepts o at time r, and 2) lUT can remain quiescent forever only if 
S accepts to be quiescent forever. 

Definition 2 The input-completion of a DTA A = (C, S,C,T,Iq) is a DTA 
InpComp(A), that contains all the timed traces of A, as well as all the timed 
traces that diverge from the timed traces of A by executing inputs not accepted 
by A^ . A is said input-complete iff A = InpComp(A). Intuitively, an input- 
complete DTA accepts every input at any time. 



Hypothesis 2 We consider without loss of generality, that input transitions are 
always unforced, because they are under the control of the environment. 



Lemma We have: (/ iocodta S) ^ (/ ioco^Tyi InpComp(S)). 

Lemma 3 implies that we can make S input-complete before to check whether 
a DTA conforms to it, w.r.t. iocooTA- Hence the following hypothesis: 

Hypothesis 3 With Lemma 3, when we use \oco uta we can (and will) con- 
sider, without loss of generality, that S is input- complete. 

Lemma 3 and Hyp. 3 are inspired from their non-real-time versions in [5]. 

Lemma 2 With Hypothesis 3: I iocodta S TOLf"’"^ C TOL^'^^. 

Lemma 2 means that with Hypothesis 3, ioco^Tvi is simplified into an inclu- 
sion of timed observable languages of DTA. 

^ For lack of space, we do not give a formal construction. 

^ All lemmas and propositions of this article have been rigorously proved. For lack of 
space, the proofs are not presented. 
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3.2 Test Purpose, and Test Hypothesis 

A complete DTA has all its events enabled at any time from any location. A trap 
location is a location in which, for each event a G S, there is a selfloop labelled 

Definition 3 A test purpose is used to select a particular functionnality o/IUT 
to be tested. In our study, a test purpose is modelled by a complete and deter- 
ministic DTA TP, equipped with two sets of trap locations A and R (for Accept 
and Refuse, respectively). All transitions of TP are unforced. Sequences to be 
tested are those terminating in a location A. Sequences not to be tested are those 
traversing a location R. 

A test purpose allows to extract a relatively small part of the specification 
before to apply a test generation method [3]. We use the following usual test 
hypothesis: 

Hypothesis 4 lUT ean be described by a (possibly unknown) input- complete 
DTA TUT. 

3.3 Formalization of the Test Generation Problem 

The inputs of the test generation problem are two DTA Spec and TP, describing 
the specification and the test purpose, resp., over the same alphabet that consists 
of inputs, outputs and internal actions. The test problem is to synthesize test 
cases that will be executed on lUT to determine whether: TUT loco dta Spec. 

The test purpose TP means that the test system will ignore every execution 
A of lUT (i.e., A G that has a prefix p, accepted by Spec (i.e., 3p G 

{A}n TOLgpf)) and s.t.: a location R of TP may be reached by p, or no location 
A of TP is reachable after p by Spec. 

By using Lemma 3 and Hyp. 3, we assume Spec input-complete. Let us 
present, in Section 4, the SEA model and the transformation 
“SetExp: DTAi-g-SEA” that will be used by the test method of Section 6. 

4 Transformation of DTA into SEA 

We present SetExp that transforms a DTA A into a ESA called Set-Exp- 
Automaton (SEA) and denoted SetExp{A). The latter uses two types of events: 
Set and Exp, in addition to the events of A. The DTA A and the SEA SetExp{A) 
are equivalent because they specify exactly the same order and timing constraints 
(of events other than Set and Exp) [2]. 

4.1 Events Set and Exp 

Event Set{ci, k) means: clock Ci is set to zero and will expire when its value is 
equal to k. 

Event Exp{ci,k) means: clock Ci expires and its current value is k. 

Therefore, Set{ci, k) is followed (after a delay k) by Exp{ci, k). 
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Fig. 2. Example: (a) DTA, and (b) corresponding SEA 



4.2 Transitions of SEA 

In the following: a denotes an event of the alphabet of the DTA A, S (resp. 
£) denotes a set of Set (resp. Exp) events, and occurrence of S (resp. £) means 
the simultaneous occurrences of all the events of S (resp. £). Here are the three 
types of transitions of a SEA SetExp{A): 

Type 1: a transition labelled {£) represents the occurrence of £. 

Type 2: a transition labelled (a) or (a,S): (a) represents the occurrence of a, 
and (<T, 5) represents the simultaneous occurrences of cr and S. 

Type 3: a transition labelled (£,cr) or (£,a,S): (£,a) represents the simul- 
taneous occurrences of £ and cr, and (£,a,S) represents the simultaneous 
occurrences of £, tr and S. 

Definition 4 An Exp-Trans of B is any transition of type 1 or 3. i.e., whose 
label contains one or several Exp events. 

4.3 Example of Transformation ^^SetExp: DTAi— >-SEA” 

For the DTA A of Fig. 2a, if all transitions are unforced, then we obtain the SEA 
SetExp(A) of Fig. 2b. The concept of forced transition was not considered in [2]; 
let us explain its influence on the result of SetExp. A transition of TR (of Type 2 
or 3) of SetExp{A) that corresponds to a forced transition Tr of A is also called 
forced transition. If a state q of SetExp(A) has an outgoing forced transition TRI 
which becomes disabled with the occurrence of an Exp-Trans TR2, then TRI 
must preempt TR2, and thus, TR2 never occurs. For every state q, we define 
the set PreemptExp{q) which contains all the labels of the preempted Exp-Trans 
in q. 

For the example of Fig. 2, if 7 is in a forced transition, then the outgoing 
Exp (c 1,2) of state M is cut and we have PreemptExp{M) = {Exp{ci , 2)}. 
Concretely, since Exp{ci , 2) leads to a state where (the forced) 7 is disabled, 
such Exp{ci , 2) must be cut, which means that it is preempted by 7 . 

4.4 Languages and Quiescences of SEA 

Let A be a DTA and B = SetExp{A) be a SEA of alphabet A and origin state 
go- Let us define the semantics of B by the set of traces accepted by B. 
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A final state q of B is a, state that: - has no outgoing transitions, such q is 
called deadlock state; or - has neither outgoing Exp- Trans nor preempted 
Exp- Trans, such q is called patient state. Intuitively, B can remain forever 
in a final state without executing any event. 

A sequence is any (finite or infinite) sequence “E 1 E 2 • • • Ej • • •”, where Ei G A. 
A trace is obtained by removing all internal actions of a sequence. 
Acceptance of finite sequence A„: A„ is accepted by B iff it is a sequence of 
B that terminates in a final state. Intuitively, B can execute A„ and become 
“quiescent” . 

Acceptance of infinite sequence Ao©: Aoo is accepted by B iff it is a sequence 
of B. 

Acceptance of trace /x: p, is accepted by i? iff ^ is obtained by removing the 
internal actions of a sequence accepted by B. Intuitively, B can have an 
execution observed as p. 

The observable language of B {OL^^) consists of the traces accepted by B. 
The quiescent observable language of B {QOL^^) consists of the finite 
traces accepted by B. 

Consistency condition requires that every Set{c,k) and the corresponding 
Exp{c, k) are separated by a delay k. It is implicitly respected in and 

QOLlp^. 

Lemma 3 Hypothesis 1 implies that every (finite^ p G QOL^^ is obtained by 
removing all internal actions from a finite sequence accepted by B. 

It is clear that when we are in a final state, then B can remain quiescent 
forever. Lemma 3 states that the converse holds assuming Hypothesis 1. More 
precisely. Hypothesis 1 implies that if B is quiescent for an arbitrarily long time, 
then we are in a final state or we are switching between final states through 
internal actions. Therefore: 

Definition 5 Final states will be also called quiescent states. 

5 Test Architecture, and a Proposition 

We use the test architecture of Fig. 3 proposed in [6] and consisting of the 
following modules: 

Clock-Handler receives Set events and sends Exp events. (It respects consis- 
tency condition.) 

Test- Controller sends inputs to lUT, receives outputs from lUT, sends Set 
events to Clock-Handler, and receives Exp events from Clock-Handler. 

The following relation confs^;^! is simply an inclusion of observable languages 
of SEAs. 

Definition 6 Let I' and S' be two SEAs over the same alphabet: {!' confsEA 
S') ^ c ol|F). 

We have the following proposition, where SUT consists of lUT and Clock- 
Handler. 
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Proposition 1 Let S he an input- eomplete DTA. Assuming that Test- Controller 
generates Set events only when they are accepted by SetExp{S) , we have: 

(TUT ioco^ijvi S') (3 SEA SUT accepting behaviour of SUT s.t. SUE 

conf 5 ^yi SetExp{S)). 

The above proposition implies that we can check “SUT confg^jyi SetExp{S)'" 
instead of “TUT \ocouta S'\ We have transformed the test problem into a 
non-real-time form, and thus, we will adapt a non-real-time method of Test 
Generation with Verification technology (TGV) [3]. 




Fig. 3. Test architecture 



Note that this architecture implies that Set (resp. Exp) events are inputs 
(resp. outputs) of SUT. 

Remark 1 Recall that in the DTA model, internal (i.e., unobservable) actions 
do not reset clocks. Interestingly, this restriction is also required by the proposed 
architecture. In fact, in order to generate Set events, Test-Controller must ob- 
serve every event to which is associated a clock reset. 



6 Synthesis of Test Cases 

Our test method consists of six steps outlined in the diagram of Fig. 4 and 
described in subsections 6.1 to 6.6. Its inputs are Spec and TP. In a first step, 
we compute a DTA SpecTP equivalent to Spec such that locations of SpecTP that 
correspond to locations A (resp. R) of TP are denoted A (resp. R). Then, we 
synthesize in five steps a set of test cases that can be used to determine whether: 
TUT iocojjTA SpecTP. The indication A and R is used to ignore every execution 
of lUT that leads to a location R or does not allow to reach a location A. 

The fact that TP is deterministic and complete implies that Spec is input- 
complete iff SpecTP is input-complete. By using Lemma 3 and Hyp. 3, we assume 
Spec input-complete. 

Figure 5 represents Spec and TP of alphabet E = {?</>, ?(t, !p, Ca, £b} used 
to illustrate the six steps of the test method. yf?cr denotes any event G E \ 
{?(t}. Actually, Spec was not initially input-complete and we represent by dotted 
arrows the part that has been added to make Spec input-complete. The test 
purpose means that: we are interested to test executions of Spec terminating by 
the first occurrence of Ip without traversing Location TL. 
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Fig. 4. Steps of the test method 




Fig. 5. Example for illustrating the test method 



6.1 Step 1: Synchronous Product of Spec and TP 

Definition 7 Let Ai = {£i, Si,Ci,Ti,loi), for i = 1,2, be two DTAs. The syn- 
chronous product of Ai and A 2 , written A^ ® A 2 , is quite similar to the classical 
product of TA. The only difference is due to the notion of forced transitions which 
we process as follows: a transition of A i 0 is unforced iff it is a “synchro- 
nization” of two unforced transitions of Ai and 

We compute SpecTP = Spec 0 TP. Locations of SpecTP that correspond 
to locations A (resp. R) of TP are denoted A (resp. R). The fact that TP 
is complete implies that Spec and SpecTP are observationally equivalent (i.e., 
TOLgp^ = TOLgp^rpp). The effect of Spec® TP is to determine in Spec the 
executions that correspond to locations A and R, respectively. For Spec and TP 
of Fig. 5, we obtain the SpecTP of Fig. 6. Locations Lq and (^oi^) are equivalent 
in the sense that Spec is executable from these locations. The difference between 
these two locations is that only corresponds to Location A of TP. 

6.2 Step 2: Transforming the DTA SpecTP into a SEA 

We compute SpecTP^^^ = S et Exp (SpecTP) . For the SpecTP of Fig. 6, we obtain 

CJP A 

the SpecTP of Fig. 7, if no transition is forced. 

— Quiescent states of SpecTP^^^ that correspond to locations A are denoted 
A. We consider only quiescent states, because sequences to be tested are 
those terminating in Location A. 

For lack of space, we do not give a more detailed definition. 



4 
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Fig. 6. Step 1: SpecTP obtained from Spec and TP of Fig. 5 



— States of SpecTP that correspond to locations R are denoted R. We con- 
sider quiescent and non-quiescent states, because sequences not to be tested 
are those traversing (and thus, not necessarily terminating in) Location R. 

In Fig. 7: ?* denotes any input (i.e., or ?(t); S means any event x € S = 
{?(/), ?(T, !p, Ca, £{)}; !Expi denotes \Exp{ci,i) for z = 2,3; (lExpi,!!) means the 
simultaneous occurrence of !Expz and any x € S; quiescent states are indicated 
by q; nodes linked by a dotted line correspond to the same state; and State A is 
equivalent to the original state sq with the difference that sq does not correspond 
to a location A of TP. 




6.3 Step 3: Extracting the Visible Behaviour of SpecTP^^^ 

By analogy with [3], we construct the visible behaviour of SpecTP in four 
substeps: 
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1. A !<5 selfloop is added to every quiescent state of SpecTP^^^, i.e., a state 
reached by a flnite execution. The result is denoted Quies{SpecTP^^^). 

2. Quies{SpecTP^^^) is projected into the visible alphabet. The result is de- 
noted Vis{Quies{SpecTP^^^)) . Lemma 3 implies that finite traces are those 
that reach a state of Vis{Quies{SpecTP^^^)) from which !<5 is executable. 

3. Vis{Quies{SpecTP^^^)) is determinized. We obtain 
Determ{ Vis{Quies{SpecTP^^^))). 

4. We denote by R every state corresponding to at least one state R of 
SpecTP^^^, and by A every state corresponding to no state R and at least 
one state A of SpecTP^^^ . The result is denoted SpecTP^g. 

Finite traces of SpecTP^^^ (i.e., traces of QOLg^^rppsEA) are those that ter- 
minate in a state having an outgoing (possibly selfloop) transition !<5. For the 
SpecTP^^^ of Fig. 7, we obtain the SpecTP^g of Fig. 8 where: So means any 
observable event x € {?(t, Icj), !p}, and State A is equivalent to the original state 
So with the difference that sq does not correspond to location A. 

Remark 2 Determinization is applied to SEA and not to DTA. This is so, 
because the determinization procedure of DTA does not necessarily preserve qui- 
escence. 




6.4 Step 4: Separating Inputs and Outputs 

In the following, the term input denotes an input of lUT or a Set event, and 
output denotes an output of lUT or an Exp event (see end of Section 5) . 

Each transition Tr of type 2 or 3 that is labeled by both inputs and outputs 
of SUT is replaced by an output transition Trl which is immediately followed 
by an input transition Tr2 (see Fig. 9). The intermediate state between Trl and 
Tr2 is called instantaneous state because its duration must be null. Inputs are 
put after outputs because: in order to execute a transition Tr labelled by inputs 
and outputs, Test-Controller waits that SUT generate the outputs of Tr and 

then, immediately, generates the inputs of Tr. Let SpecTP^g denote the result 
of Step 4. For our example, each transition labelled {\Exp{ci , i), lx) is replaced 
by two consecutive transitions labelled \Exp{ci,i) and lx, respectively, where 
X = a or <f). 
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Fig. 9. Step 4: Separation of inputs and outputs 



6.5 Step 5: Computing a Complete Test Graph (CTG) 

By analogy with [3], we construct a Complete Test Graph (CTG) as follows: 

— Let L2A denote the set of states of SpecTP^g from which a state A is 
accessible. 

— Let Pass denote the set of states A of SpecTP^g . 

— Let Fail = {fail} consist of a new state that is accessible by every non- 
specified output transition of SpecTP^g executable from L2A. 

— Let Inconc be the set of states of SpecTP^g that are not in L^^dUPass and 
that are accessible from L2A by a single output transition of SpecTP^g . 

— We then obtain CTG from SpecTP^g by: adding (implicitly) Fail and its 
incoming transitions, removing every state q ^ L2A U Pass U Inconc U Fail, 
and removing outgoing transitions of every state q G Pass U Inconc. 

To synthetize test sequences executable in acceptable time, we define delays Tm 
and Tm such that Tm < Tm, and fictitious events \5m and \5m as follows: lUT 
is considered quiescent (resp. quiescent forever) if it generates no output during 
a period Tm (resp. Tm). The occurrence of \8m (resp. \5m) means that Test- 
Controller has detected that no output occurs during a period Tm (resp. Tm). 
We proceed in CTG as follows: 

For every state ^ Pass U Inconc U Fail with an outgoing (possibly selfloop) 
!(5: replace !(5 by \5m, and add an outgoing \5m that leads to a state G Inconc. 
Intuitively, if lUT is quiescent forever and no verdict (Pass, Inconc or Fail) 
has been produced, then the verdict Inconc is generated. 

For every state q G Pass : insert a transition labelled !Sm between: the incom- 
ing transitions of q, and q itself. Intuitively, the verdict Pass is generated 
only when lUT is quiescent and a state Pass has been reached. 

Note that in CTG, every input (resp. output) must be interpreted as an out- 
put (resp. input) of the tester. For the SpecTP^g of Fig. 8, we obtain the CTG 
of Fig. 10. Transitions !8m and !Sm in State 0 are irrelevant, because they can 
be preempted by the only possible other transition labelled {la,lSet{ci , 2, 3)), 
which is under the control of Test-Controller. Transition !Sm in State 3 indicates 
that we are in a quiescent state. Transition !Sm in State 3 indicates that we 
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are and will remain in a quiescent state; that is, \5m indicates that \p will not 
be generated, and thus, the sequence that leads to a State Pass will not be 
completed; that is why \5m leads to a State Inconc. Transition \8m in State 4 
indicates that after the execution of !p, we are in a quiescent state, and thus, 
we can generate a verdict Pass. For simplicity, the state fail and its incoming 
transitions are not represented; fail is implicitly reached by every non-specified 
output. Note that transition \6x can be easily realized by using lSet{co, T^) and 
\Exp{co, Tx), for X = m,M, where cq is a clock not used for describing timing 
constraints of Spec. 




Fig. 10. Step 5: CTG obtained from SpecTP^g (the latter being obtained from 
SpecTPfi^ of Fig. 8) 



6.6 Step 6: Constructing Test Cases 

Similarly to [3], the objective is to extract so-called controllable subgraphs of 
CTG as follows, assuming that inputs can preempt outputs: In a state of CTG, 
either one input transition is kept and all other input and output transitions 
are pruned, or all output transitions are kept and input transitions are pruned. 
Unreachable states are suppressed. Reachability to Pass can be preserved by 
a backward traversal of CTG. See [3] for more details on this step. For our 
example, the only controllable subgraph of the CTG of Fig. 10 is obtained by 
removing !Sm and !Sm in the initial state. 

Note that by construction, all test cases are feasible. This is so, because a 
SEA produced by SetExp does not contain unfeasible paths. 

7 Conclusion and Future Work 

We extend the test method TGV [3] to the real-time case. Our approach is 
based on the use of a method that transforms a TA into an equivalent FSA, 
called Set-Exp- Automaton (SEA), using two additional types of events: Set and 
Exp. In comparison with some related work [7,8,9,10,11,12,13,14]®, we support 
nondeterminism and internal actions, define quiescent states in the real-time 
case, define the notion of forced transitions, and use an exact time information 
instead of a discrete time. And our method is relatively simple. Here are some 
possible future work: 

— We intend to implement our method and apply it to complex examples. We 
have started to work on this aspect by implementing the transformation 
SetExp. 

For lack of space, we omit to cite other references 
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— We have noted that, in the worst theoretic case, there is a state explosion 
problem with the obtained SEA CTG. Our conviction is that such a worst 
case is very rare and improbable. We intend to investigate such issue in a 
near future. 

— Our method does not support unobservable clock resets (i.e., internal actions 
with clock reset). We intend to determine conditions under which our method 
is applicable in the presence of unobservable clock reset. 

— We assume that lUT is centralized. When lUT is distributed in several 
sites: 1) a distributed test architecture consisting of several testers must 
be designed, and 2) every synthesized test case must be distributed into 
local test cases which are executed by the different testers, respectively. This 
distributed aspect will be studied in a near future. 

— In a test sequence, the exact instants of inputs/outputs are not specified. 
During each test execution, the tester has to select an exact instant of each 
input a in the period of time when a is enabled. Therefore, the same given test 
sequence must be tested several times, for different instants of inputs. Since 
the number of instants at which any input may be applied is theoretically 
infinite, we will propose a method for selecting a finite number of relevant 
instants. 
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Abstract. In this paper we propose an approach to automatically pro- 
duce test cases allowing to check the satisfiability of a linear property on 
a given implementation. Linear properties can be expressed by formulas 
of temporal logic. An observer is built from each formula. An observer is 
a finite automaton on infinite sequences. Of course, testing the satisfiabil- 
ity of an infinite sequence is not possible. Thus, we introduce the notion 
of bounded properties. Test cases are generated from a (possibly partial) 
specification of the lUT and the property to validate is expressed by a 
parameterised automaton on infinite words. This approach is formally 
defined, and a practical test generation algorithm is sketched. 



1 Introduction 

Testing is certainly one of the most popular software validation techniques and it 
is a crucial activity in many domains such as embedded systems, critical systems, 
information systems, telecommunication, etc. Consequently, a lot of work was 
carried out during the last decade to both formalise the testing activities and to 
develop tools allowing to automate the production and execution of test suites. 

The particular problem of testing if an implementation is “correct” with 
respect to its specification is referred to as conformance testing. This problem 
was mainly investigated inside the telecommunication area (as described in the 
ISO standard 9646 [7]), and a formal approach was outlined in [19,11,3]. These 
works gave birth to several (academic and commercial) tools[l,12,17,6] able to 
automatically generate test cases from a system specification. For instance, in [5], 
a technique is proposed to derive test cases from a (formal) specification and a 
test purpose. This technique is based on a partial exploration of a kind of product 
between the specification and the test purpose. An associated tool, called Tgv, 
was developed by Irisa Rennes and Verimag Grenoble. 

We first explicit a bit more the concepts of black box testing and conformance 
testing. 

Black Box Testing. We consider here “black box” testing, meaning that the 
behaviour of the lUT (Implementation Under Test) is only visible by an external 
tester, through a restricted test interface (called PCO, for Points of Control 
and Observation). There exists two kinds of interactions between the tester and 
the HIT: outputs of the tester are stimuli sent in order to control the HIT, 
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whereas inputs of the tester are observations of the lUT’s outputs. These sets of 
interactions are described by a test architecture. In black box testing the internal 
state of the lUT is not observable by the tester. Consequently: 

~ the tester cannot observe the internal non-determinism of the lUT; 

— the tester should remain deterministic since it cannot backtrack the lUT to 
a given internal state. 

A possible model to describe these sequences of interactions is Input-Output 
Labelled Transition System (lOLTS, see definition below). 

Conformance Testing. Conformance testing is based on the following concepts: 

— lUT: Even if the internal code of the lUT is not visible from the outside, 
its behaviour can be characterised by its interactions with its environment. 
This external behaviour can be modeled with an lOLTS. We suppose in the 
following that this lOLTS is input complete, that is, in each state the lUT 
cannot refuse any input proposed by the environment. 

— Test architecture: Test architecture defines the set of interactions between 
the lUT and an environment, distinguishing between controllable and ob- 
servable events. IS09646 standard proposes four test methods : local test 
method, distributed test method, coordinated test method, and remote test 
method. All these methods are based on the black box testing principles and 
describe possible environment of the lUT. Of course, test architecture is a 
parameter of a test generation technique. In this paper, we consider a local 
test architecture. 

— Specification: The specification represents the expected behaviour of the lUT, 
to be used as a reference by the tester. This expected behaviour can be also 
formally modelled by an lOLTS. Note that the specification not necessarily 
describes only the visible behaviour of the lUT, but it may also contain some 
of the internal actions performed by the implementation. 

— Conformance relation: Defining whether an lUT is correct or not with re- 
spect to a specification is achieved in this context by introducing a formal 
relation between lOLTS. Several relations have been proposed so far, such 
as ioco [19]. Other relations have also been proposed on other models (such 
conf [3]). 

— Test case: Roughly speaking a test case is a set of interactions (input and 
output) sequences a tester can perform on an HIT interface. When executed, 
each interaction sequence delivers a verdict indicating whether the HIT was 
found conform or not on this particular execution (with respect to a given 
conformance relation). 

— Test purpose: The test purpose represents a particular functionality (or sets 
of abstract scenarios) the user wants to test. It can also be modeled by an 
lOLTS, and may be used to automate the test case generation. 

Although this conformance testing framework is now well established and hap- 
pens to be be very useful in the telecommunication area, its use in other appli- 
cation domains suffers, in our opinion, from two important limitations: 
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~ first, it requires a rather exhaustive formal specification, since conformance is 
defined with respect to this specification, and any lUT exhibiting unexpected 
behaviours (from this specification point of view) would be rejected; 

— second, the conformance relation is not very flexible: it is not always easy 
to understand what it does exactly preserve, and, more important, it is not 
possible to adapt it to the particular functionality one wants to test. 

We propose in this work to extend this framework (and particularly what was 
done inside the Tgv tool) to the generation of property oriented test cases. The 
general idea is to allow automatic test generation from a partial specification (not 
necessarily expressing the overall expected behaviour of the system), and with 
respect to a particular property (test case execution should indicate whether the 
lUT satisfy or not this property). This approach is outlined below. 

Property Testing. The properties we consider are linear properties: each property 
defines a language (i.e., a set of sequences), and an lUT satisfies a given property 
if and only if all its execution sequences belong to its associated language. In this 
context it is a common practice to distinguish between safety properties, that 
can be checked by considering only finite execution sequences of the lUT, and 
liveness properties that need to consider also the infinite ones. Several charac- 
terisations of such properties have been proposed in the verification community, 
based on various specification formalisms : automata on infinite words (recog- 
nising w-regular languages), linear-time temporal logics (or /i-calculus), boolean 
equation systems, etc. Automata on infinite words, like Biichi automata [4], are 
very interesting from an algorithmic point of view, and they are used in several 
decision procedures [20] implemented in model checkers. It can be shown in par- 
ticular that any w-regular language can be characterised by a Biichi automaton, 
or, equivalently, by a deterministic Rabin automaton, see for example [9,16]. 
Since the use of a deterministic automaton is an important issue in the test gen- 
eration technique we propose in this paper, we will consider in the following that 
the property to be checked is expressed by a deterministic Rabin automaton. Of 
course, testing the satisfiability of a liveness property is not possible: it would 
require an infinite execution time. However, automata on infinite words can be 
parameterised to specify so-called bounded liveness properties: the automaton 
recognises a set of infinite execution sequences and some external parameters 
simply limit the “length” of the sequences to consider (this length being ex- 
pressed for instance in number of interactions, or as an overall execution time). 

More precisely, the test generation technique we propose can be sketched as 
follows: 

~ A (possibly partial) specification S is used as a “guideline” for the test case 
synthesis, and it is therefore supposed to be “closed enough” to the actual 
behaviour of the HIT. Note however that we do not require at this level any 
particular conformance relation between S and the HIT. 

— A safety or bounded liveness property V is given through an observer Obs 
recognising sequences of ->7^. This observer is a parameterised automaton on 
infinite words. 
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~ Test cases are automatically generated by traversing the specification in or- 
der to find the “most promising” execution sequences able to show the non 
satisfiability of P by the lUT. These execution sequences are the sequences 
recognised by Obs that are the “closest” to the ones provided by the speci- 
fication. 

Related Work. Producing test cases from a formal specification to check the 
satisfiability of a given property is a rather natural idea, and consequently nu- 
merous works have been already carried out in this area, leading to various kinds 
of tools. They mostly differ in the nature of the specification and property they 
consider, and they are often based on probabilities to select the test sequences 
(such in [15,10,8,13]). However, an original aspect of our approach is the use of 
parameterized automata on infinite words to specify properties and to instanci- 
ate them only at test time. In addition, test cases we produce are lOLTS (not 
only sequence sets) that can be executed against non deterministic lUTs. 

2 Models 

This section formalises the different elements involved in the test case generation 
framework we propose. 

2.1 Input-Outputs Labelled Transition Systems 

The basic models we consider are based on Input-Output Labelled Transition 
Systems (lOLTS), namely Labelled Transition Systems in which input and out- 
put actions are differentiated (due to of the asymmetrical nature of the test- 
ing activity). We consider a finite alphabet of actions A, partitioned into two 
sets: input actions Aj and output actions Aq- A (finite) lOLTS is a quadruplet 
M=(Q“, A“, T“, g“;^) where is the finite set of states, is the initial 
state, A“ C A is a finite alphabet of actions, and C x A“ U {r} x is 
the transition relation. Internal actions are denoted by the special label t ^ A. 
T is assumed to be unobservable for the system’s environment whereas actions 
of A“ are visible actions representing the interactions either between the system 
and its environment, or between its internal components. 

Notations. We denote by N the set of non negative integers. For each set X, 
X* (resp. X‘^ = [X— >-N]) denotes the set of finite (resp. infinite) sequences on 
X. Let cr S X*; Oi or (j{i) denotes the element of a. We adopt the following 
notations and conventions: Let a € A*, a G A, p,q G Q^. We write p q 
iff (p, a, q) G T“ and p — >-m q iff 3tTi, a 2 ■ ■ ■ (Jn G A, po, ■ ■ ■ ,Pn G such that 
tr = CT 1 .CT 2 . . . cr„ and po = p, pi Vi+i for i < n, Pn = q. In this case, a is 

called a trace or execution sequence, and po' ■ 'Pn a run over a. An infinite run 
of M over an infinite execution sequence cr is an infinite sequence p of such 

that 1. p(0) = and 2. p{i) p{i + !))• inf(p) denotes the set of symbols 
from Q“ occurring infinitely often in p: iuf(p)={g j Vn. 3L i > n. A p(i) = g}. 
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Let V a subset of the alphabet A. We define a, projection operator Iv- A*^V* 
in the following manner: e iv= e, (a. a) iv= cr iv if o- ^ V, and (a. a) \.v= 
a.{a 4,1/) if a € V. This operator can be extended to a language L (and we note 
L IV) by applying it to each sequence of L. The language recognised by M is 
C{M) = {re I 3(7 such that ^ q}. 

Let M=(Q“, (7j“j^.) an lOLTS, we recall the completeness, determinism 

and quiescence notions. 

Completeness. M is complete with respect to a set of actions X Q Aii and only if 
for each state (7“ of and for each action x of X, there is at least one outgoing 
transition of from (7“ labelled hy x G X: 

Vp“ G - Mx G X ■ 3(7“ G such that Am q^- 

Determinism. M is said deterministic with respect to a set of actions X if and 
only if it is a deterministic lOLTS containing only actions labelled by elements 
of X: 

Vp“ G • Vx G X • 4 m A 4m ^ = ( 7 '“. 

We introduce a determinisation operator 

det (M,X)=(Q<^'>‘ (M.x)^^det (M,X)^ J.det (M,x) ^ ^d^et^(M.x) ^ coiupute & determiii- 
istic lOLTS with respect to X associated to M. This lOLTS is defined as follows: 

QdeUM.x) c 20", = X, (7^/"'"’ = U^Q^\ 4m gAo; G (Al\X)4 

and = {{Sp,a, Sq) \ 3p G Sp. 3q G Sq. p “4m q with a G X f\ u G 

(A \ X)*}. Note that, £(M) 4. X = £(det (M, X)). 

Quiescence. A test should be able to observe lUT quiescence [19]. Several kinds 
of quiescence may happen: a state p is said quiescent in M either if it has no 
outgoing transition (deadlock), or if it belongs to a cycle of internal transitions 
(livelock) : 

a 

quiescent {p) = {^{a, q). p — >-m g) V p — >-m P 

Quiescence can be modelled at the lOLTS level by introducing an extra tran- 
sition to each quiescent state labelled by a special symbol 5. S is considered as an 
output (observable by the environment). In practice, the quiescence is observed 
by means of timers: a timeout occurs if and only if the implementation is locked 
inside a quiescent state. Formally, we handle quiescence by associating to LTS M 
its so-called “suspension automaton” S (M) = (Q^, A^ U {i5}, where 

rpS{M) _ rpM y \ p G A quiesceiit (p)}. 

2.2 Specification and Implementation 

The system specification is in general expressed using a dedicated language 
or notation (SDL, Lotos, UML, etc). The operational semantics of this lan- 
guage can be described in terms of lOLTS. Thus, we note the specification 
S=(Q^ A^T^gAJ, with = Af U A^. 
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The Implementation Under Test (lUT) is assumed to be a “black box” those 
behaviour is known by the environment only through a restricted interface (a 
set of inputs and outputs). From a theoretical point of view, it is convenient to 
consider the lUT behaviour as an lOLTS IUT=(Q™^, qUj):), where 

^lUT _ yiyJT y^iuT jg jpjrp assume in addition that this lUT is 

complete with respect to to Aj (it never refuses an unexpected input), and that 
the specification S is a partial lOLTS of the lUT: 

A" C and C{S) C £(IUT) | {A^). 

Intuitively, a specification is partial if each trace of the specification may be 
executed by the lUT (but the lUT may contain unspecified behaviours). 

2.3 Property and Satisfiability Relation 

The objective of this work is to generate test cases allowing to check the sat- 
isfiability of some classes of properties on a given lUT. In particular we re- 
strict ourselves to linear properties, those associated models are sets of lOLTS 
execution sequences. Two kinds of linear properties can be considered: safety 
properties, characterised by finite execution sequences, and liveness properties, 
characterised by infinite ones. Thus, an lUT will satisfy a given linear property 
V if and only if all of its execution sequences belong to the model of V. 

From the test point of view, only the (non-) existence of a finite execution 
sequence can be checked on a given lUT (since the test execution time has to 
remain bounded). This restricts in practice the test activity to the validation of 
safety properties. Nevertheless, an interesting sub-class of safety properties are 
the so-called bounded liveness. Such properties allow for instance to a express 
that the lUT will exhibit a particular behaviour within a given amount of time, 
or before a given number of iterations has been reached. From a practical point 
of view, it is very useful to express such properties as liveness (i.e., in terms of 
infinite execution sequences, telling that the expected behaviour will eventually 
happen), and then to bound their execution only at test time. The main advan- 
tage is that the “bounds” are not part of the test generation process, and they 
can be chosen depending on the concrete test conditions. Therefore, we propose 
in this section to specify the properties of interest using a general model, allow- 
ing to express both finite and infinite execution sequences. This model is then 
“parameterised” to handle bounded liveness properties. 

Automata on Infinite Words. Several acceptance conditions (Biichi, Muller, 
Streett, Rabin, etc) have been proposed to extend finite-state lOLTS to recog- 
nise infinite sequences. We recall the definition of Biichi and Rabin automata 
and illustrate on an example the difference between them. 

Definition 1. A Biichi automaton Rb is a structure (B.Q^) where B = (Q‘^,A^, 
lOLTS and Q'^ is a subset of . The automaton Rt, accepts an 
infinite execution a of if there is an infinite run p of B over a such that 

inf(p) n yf 0. 
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Definition 2. A Rabin automaton Ra is a structure (R,T^) where R = 

T^, is an lOLTS and T^={(Li, f/f), {L 2 , U 2 ), • ■ ■ , (if, U^)) is a pairs table 
with Lf,Ui C for i S The automaton Ra accepts an infinite 

execution a of A^^ if there is an infinite run p of R over a such that for some 
i & {1,2, , k}, mi(p) n if 0 and ini(p) fl if = 0. 



d,n d 




Fig. 1. Non deterministic Biichi automaton recognising (d + n)*d“ 




Fig. 2. Deterministic Rabin automaton recognising (d + n)*d“ 



Example. As an example, consider the following property “The system always 
comes back to its nominal mode (action n) after entering a degraded one (action 
d)” . This property can be expressed by the following (w-regular) language: L = 
{d*n)“^. The negation of this property is expressed by i = {d + n)*d“^ which 
is not recognisable by a deterministic Biichi automaton. The non deterministic 
Biichi automaton recognising L is given by the figure 1, with = {2} and the 
initial state is 1. 

Consider now the deterministic automaton of figure 2 as a Biichi automaton, 
with = {2} and the initial state is 1. This automaton accepts all sequences 
containing infinitely often many occurrences of n or many occurrences of d, which 
are not in L. 

Now, if the automaton of figure 2 is considered as a Rabin automaton with 
the pair table {{2},{1}}, then this automaton recognises exactly L (it accepts 
an infinite word iff it has infinitely many occurrences of d). Thus, we consider in 
this paper deterministic Rabin automata [14] since they recognise all classes of 
w-regular language. 

As another example, the figure 3 shows a Rabin automata with pair (L,U) 
equals to ({3}, 0) recognising execution sequences in which a req action is at 
some point followed by an error action. The i5-loop on state 3 indicates that a 
finite execution sequence terminating after an error action is recognised by this 
automaton. This artefact allows to deal both with finite and infinite execution 
sequences. 

Rabin automata are a natural model to express liveness properties. However, 
to correctly handle bounded liveness as well, we need to “parameterise” these 
automata in order to limit the size of the infinite execution sequences they recog- 
nise. The (simple) solution we propose consists in associating a counter to each 
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request eiTor 




Fig. 3. Example of a safety property expressed by a Rabin automaton 



state belonging to an {Li, Ui) pair. An execution sequence a is now recognised 
if and only if it visits “sufficiently often” an L^-state, and “not too often” an 
C/i-state, according to the counters associated to these sets (those actual value 
will be instantiated at test time). 

Definition 3. A Parameterised Rahin automaton is a tuple PRa = {R,T^,C’^) 
where {R,T^) is a Rabin automaton and C = {{cli,cui), . . . ,{clk,cuk)'\ with 
cli, cui G N. An execution sequence a is accepted by PRa if and only if: there is 
an finite run of PRa p on u such that for some iG{l,2,...,fc} 

\{j I pU) e Lf}\> cli and \{j \ p{j) G Uf}\< cu^ 

Thus, the language accepted by PRa is C{PRa), the set of sequences accepted by 
PRa. 

Observer and Satisfiability Relation. Test case generation with respect to a lin- 
ear property V is facilitated by considering an observer automaton recognising 
exactly the execution sequences of -'V. Since we want to deal with safety and 
bounded liveness properties we choose here to model these observers as deter- 
ministic Parameterised Rabin automaton Obs = {0,T° ,C^). We are now able 
to formally define the satisfiability relation relation we consider between an HIT 
and a linear property. 

Definition 4. Let HIT be an lOLTS, V a property, and Obs = {0,T~° ,C°) the 
observer recognising the sequences of->V, where O = (Q°, T°, Then, 

HIT satisfies V iff {C{IUT) f A°) n C{0) = 0. That is, none of the observable 
execution sequences of the HIT are recognised by the observer. 

2.4 Test Architecture and Test Case 

Test Architecture. At the abstract level we consider, a test architecture is simply 
a pair {Ac, A„) of actions sets, each of them being a subset of A : the set of 
controllable actions Ac, initiated by the tester, and the set of observable actions 
Au, observed by the tester. A test architecture will be said compliant with an 
observer Obs if it satisfies the following constraints : Af C Ac and Aq C A„. 
In other words the tester needs at least being able to control (resp. observe) all 
inputs (resp. outputs) appearing in the observer. 

Test Cases. For a given observer Obs, a test architecture {Ac, A„) compliant 
with O, an (abstract) test case is a Parameterised Rabin automaton (TC,T^‘^, 
C^'^) with TC=((5'^°, gW^) and satisfying the following requirements: 

1. U A'ff with A'ff = Ac and AJ'^ C A„. Note that Ac (resp. A„) is 

the set of controllable (resp. observable) actions. 
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2. TC is deterministic wrt A^'^, controllable (for each state of there is 
at most one outgoing transition labelled by an action of Ac), and input- 
complete (for each state of Q^'^, for each element a of Au, there exists exactly 
one outgoing transition labelled by a). 

3. The pair table = {{LJ^ ,Ui ^), . . . , (T™, is defined with respect 

to the observer Obs: 

^Tc g i^‘=(resp. iff 3a S e T°(resp. U°) such that 4 q° 

and g-4 q-^ 

The last condition expresses that there is an execution sequence of the test case 
starting from the initial state of the test case and leading to a state of L™ (resp. 

if there is a corresponding execution sequence of the observer starting from 
the initial state of the observer and leading to a state of Lf (resp. U°). 

2.5 Test Cases Execution and Verdicts 

Let IUT=((3™'^, an implementation, (TC,T’’°,C^‘^) a test case 

with TC=(Q’^°, 9hut)i A^) a test architecture. The test exe- 

cution of TC on lUT can be modelled by a parallel composition between lUT 
and TC with synchronisations on action sets Ac and Au- More formally this test 
execution can be described by an lOLTS 6={Q^ , A^ ,T^ , qA^) , where A^ = A'^‘^, 
and sets and are defined as follows: 

— is a set of configurations. A configuration is a triplet A) where 

G and A is a partial function from to N, which 

counts the number of times an execution sequence visits a state belonging 
to or 

— is the set of transitions A) 4^ (g'^°, A') such that 

• 4,c 4"" 4 .„t and 

fA(g-4 ifg-^^ U (LruC/n 

^ ^ I A(g^4 + lifg^^ G IJ {LJ^AUD 

The initial configuration is (gP?t, g?";^', Ainit), where for all g, Ainit(g) = 0. 

describes the interactions between the lUT and the test case. Each counter 
associated with a state of LJ‘^ U Uf‘^ is incremented when an execution sequence 
visits this state. 

Verdicts. Test execution is supposed to deliver some verdicts to indicate whether 
the lUT was found correct or not. These verdicts can be formalised as a function 
on runs of £ to the set {Pass, Fail}. More precisely: 

Fail. The execution of a run p oi £ on a gives the verdict Fail if and only if 
there is an i G {1, 2, . . . , fcj and a / G N such that 

1. p{l) = pf^ G L4 and Ai(p^^) > cli, and 

2. for each m G [0 • • • Z] p{m) = (g™, Am) satisfies \m{qfff) < cui. In 
this case, the property is not satisfied. 
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Pass. Similarly, the execution of a run p give the verdict Pass iff Vz G {1, 2,. . k} 
and VZ € N 

1. p{l) = and pf^ € implies Xi{pf^) < ck or 

2. there is a m G [0---1] p{m) = (g™, A^) and > cu^. 

In practice the test case execution can be performed as follows: 

~ At each step of the execution the controllability condition may give a choice 
between a controllable and an observable action. In this situation the tester 
can first wait for the observable action to occur (using a local timer), and 
then choose to execute the controllable one. 

— Formal parameters are instantiated according to the actual test envi- 
ronment. Counters are then associated to each sets and LJ^. These 
counters, initialised to 0, are incremented (inside the test case) whenever a 
corresponding state is reached during test execution. Thus, a Fail verdict is 
issued as soon as an incorrect execution sequence is reached (according to 
definition above), and a Pass verdict is issued either if the current execution 
sequence visits “too many often” a state of > cui), or if a 

global timer, started at the beginning of test execution, expires. This last 
case occurs when an execution sequence enter a loop without state belonging 
to or {7™. 



3 Test Generation 

We propose in this section an algorithm to automate the generation of “property 
oriented” test cases. This algorithm takes as input a (partial) specification Sq of 
a given implementation lUT, an observer (a deterministic parameterised Rabin 
automaton) Obs = (0,T*^,C®) characterising the negation of a linear property 
P), and a test architecture TA = (A^, A„). Test cases produced by this algorithm 
are sound in the sense that, when executed against the lUT, a Fail verdict is 
produced only i/this HIT does not satisfy property V. 

The test generation algorithm we propose is based on two steps: generation 
of a so-called test graph (TG, for short) and test eases selection from this TG. 
We first describe these two steps at an abstract level, and then we discuss some 
implementations issues. 

3.1 Test Graph 

The purpose of the test graph is to gather a set of execution sequences, com- 
puted from the specification S'o and the observer Obs, compliant with the test 
architecture TA (i.e., executable by an external tester), and able to witness the 
non satisfiability of V for a given HIT. Each controllable sub-graph of this TG 
could then be turned into an executable test case for property V (as defined in 
the previous section). 

However, even for a simple property and with a restricted test architecture, it 
appears that the number of sequences matching this definition is quite large: in 
fact it could be any sequence over Ac U A^ recognised by O. Gonsidering such a 
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“complete” test graph would be of limited practical interest in this context: most 
of these sequences are likely to be very “far” from the actual lUT behaviour, 
and executing them would not provide very useful information. Consequently, 
the probability to extract a “relevant” controllable test case from this large set 
would be rather low. Therefore we need some heuristic to restrict this test graph 
to the most promising execution sequences. 

The heuristic we propose here to compute the test graph is to exploit at best 
the information provided by the specification, proceeding as follows: 

1. First, we transform the initial specification So by computing its deterministic 
suspension automaton S with respect to the test architecture TA: 

S = S (det (S'oj^c U Au)). This operation preserves the observable/ control- 
lable language of the specification: £{S) = £{So) i (Ac U A^)- 

2. Then, we select the longest sequences of C(S) matching with a prefix of 
C(0). Such sequences are the most promising candidates to witness the 
non-satisfiability of V) since they belong both to the specification (and then 
are supposed to be executable on the lUT), and to a prefix of C(0). 

3. Finally, these sequences are then extended to cover complete sequences of 
C(0). Note that if the specification already contains a complete sequence of 
C(0) (and not only one of its proper prefix) this means that the specification 
itself does not satisfy P. 

From a more formal point of view the test graph we compute is a param- 
eterised Rabin automaton (TG,T ^^ the lOLTS TG gathers the set of 
execution sequence described above, and the pair table and counter sets 

CTG 

are inherited from and C®. This is described in definition 5 below, 
proceeding in two steps: 

1. Computation of an asymmetric product C) between S and O. The purpose of 
this product is to mark each state ps of S with a corresponding state po of 
O, such that ps and po are reachable from the initial states by “matching” 
execution sequences (rules R1 and R2). 

2. Selection of the longest execution sequences of S' 0 O matching with a prefix 
of P(0), and extension of these sequences to obtain a complete sequence 
of C(0). This is performed by rule R4: a transition (ps,Po) (ps,Qo) is 
added to the transition relation T'^'^ iff such a transition exists in O but not 
in S 0 O. 



Definition 5. Let TA = (Ac, A^) a test architecture, So a specification and 
S=(Q^, its deterministic suspension automaton with respect to TA: 

S = S (det{So, AcU Ayfj). Let (0,T^,C^) be an observer with 

0=(Q°, A<^, T°, C,,) and = ((L°, C/f ), (L°, (L°, U°)) such that 

TA is compliant with O. We define the Parameterised Rabin automaton 
(TG,T^^,C^^) where TG=(Q^°, such that Q x 

ATG Q j^S ^ qTG _ <5™, T‘^° are obtained as follows: 
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1. Let Q® and T® he the smallest sets satisfying rules RO, R1 and R2 below: 

9o° C Q® [RO] 

{PS,Po) ^ PS -^TS qs, Po -^TO go 

{qs,qo) s Q®, {ps,po) -^T® {qs,qo) 



{ps,Po) ^ Q®, PS qs, -•Po -^Tg’ 

{qs,po) G Q®, (ps,po) -^T® (qs,po) 



2. Then, and are the smallest sets satisfying rules R3 and R4 below: 
Q® C Q™, T® C T™ [R3] 

(ps,Po) G Q™, 

PO -^ro qo, CT G \ A^)* 

^qs- {{ps,po) (qs,qo)) 



(ps,qo) G Q™, {ps,Po),-^TTG,{ps,qo) 



[R4] 



3. The pair table is equal to ((L™, C/™), (L™, Uf°), (L™, i7™)) where 
and Lf° are defined as follows: 



= {{PS,Po) G Q™ I qo G Lf} 
t/r = {(PS,Po)GQ™| 

4- The set of counters is directly inherited from Obs: 

C™ = C® 



3.2 Test Cases Selection 

The purpose of the test case selection is to generate a particular test case TC 
from the test graph TG. Roughly speaking, it consists in “extracting” a sub- 
graph of TG that are controllable and containing a least a sequence of C{0). 

Glearly, to belong to C{0), an execution sequence of O has to reach a cycle 
containing a state belonging to some distinguished set Lf (for some i) of the pair 
table associated to O. Gonversely, any sequence of O not leading to a strongly 
connected component of O containing a state of Lf cannot belong to C{0). 
Therefore, we first define on TG the predicate L2L (for “leads to L”), to denote 
the set of states leading to such a strongly connected component: 

L2L{q)=3{qi,q2,uJi,uj2,wi3).{q qi ^t^g q2 ^t^g giand 3i. q2 G Lf) 

We can now define a subset of relation T^*^, controllable, and containing at 
least a sequence of C{0). This subset, computed by the function select below, 
contains all non controllable transition of (labelled by an element of A„), 
and at most one (randomly chosen) controllable transition of leading to a 
state of L2L when several such transitions exist from a given state of TG: 

select {T'^^) = {{p, a, q) G \ a € or 

a = one-of {{at G Ac \ p -^x^g qi and L2L (gi)})} 

Note that this function preserves the reachability of states belonging to 
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Finally, this subset of remains to be extended with all (non controllable) 
action of a„ not explicitly appearing in to ensure that the test case execu- 
tion will never be stopped by reception of an unexpected event. The definition 
of a test case TC is then the following: 

Definition 6. let (TG, a test graph with TG=(Q’'‘^, A™, T™, 

and TA = fAc, A^) a test arehitecture. A test ease is a Param- 

eterised Rabin automaton with TG=(Q^°, A™, such that Qq'^ = , 

A"’"^ = A"^^ U Ay^, is the subset of reachable by from Qq'^ , 
is the restriction of over , and is defined as follows: 

= select U {(p, a,p) | a € Ay andjBq. {p, a, q) G 



3.3 Implementation Issues 

We briefly sketch the concrete algorithms that could be used to implement the 
test case generation method proposed in this section. The objective here is not to 
provide a detailed implementation description (beyond the scope of this paper), 
but rather to give some indications on its algorithmic complexity. A possible 
(and simple) approach to compute a test case TC from a specification S, an 
observer Obs and a test architecture TA is to proceed as follows: 

1. computation of S (determinisation and suspension of Sq) and computation 
of sets Q® Emd T® introduced in definition 5. These operations can be done 
during a joint traversal of S and O. 

2. computation of the test graph TG (sets and T'^^) from the previous 
result. This can be done through a single traversal of Q® and T®. 

3. computation of the strongly connected components of TG containing a dis- 
tinguished state of Lf, using for instance Tarjan’s algorithm [18]. This op- 
eration also gives the L2L predicate. 

4. test case selection (computation of function select) using a backward traver- 
sal of TG. 

Apart the determinisation phase, all these operations remain linear in the num- 
ber of transitions of the LTS considered, but the test graph has to be explicitly 
stored. However, some of the algorithms proposed in the TGV tool could cer- 
tainly be used to perform most of these operations on an on-the-fly basis. This 
point has not been investigated at this time. 

4 Example 

4.1 System Description 

We consider a control system for an automatic door, specified by the lOLTS 
given in figure 4. The behaviour of the controller is the following: it can receive 
a request for opening the door (REQDPEN), the door is then successively open 
(open) and closed (CLOSE). It can also receive a LOCK request, those effect is to 
definitely lock the door, or any other requests OTHER, that are silently ignored. 
All these actions are supposed to belong to the test architecture. A possible 
specification of the controller is the lOLTS shown at the figure 4. 
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input (Other) 

inPut(Lo^|;;p^t(R,qOpen) 

4 2 

I output (open) 

3 



output (close) 



Fig. 4. Specification 



The property we want to test on this system is: whenever the door is open, then it 
should be closed before a given amount of time (to be precised at test time) . The 
negation of this property (the observer) is modelled by the Parameterised Rabin 
automaton of figure 5, where the a label denotes any observable action other 
than CLOSE (including S). We now assume that the lUT is not quite conform 
to the specification. In particular it may spontaneously output an ABORT action 
and re-enter the initial state. The corresponding lOLTS is pictured on figure 8. 

U 1 Cu, 

II 

output (close) M output (open) 

L 2 cu 

LA ^ 

a 

Fig. 5. Observer 



4.2 Test Graph Generation 

The first step consists in generating a test graph from the specification and the 
observer. The corresponding deterministic parameterised Rabin automaton is 
shown on figure 6. Note that the sets L and U are inherited from the observer. 
On this test graph, the execution sequences belonging to the language of the 
observer are the ones ending by (namely in states 32 and 42). 



input(Lock) 



nput(Other) 

'input(ReqOpen) 



41 U 

output(open) i A 






U 

output(close) 



output(open) 



•LT42L 



y ■ output(close) 
42 1 



32 ' 



Fig. 6. Test Graph 



4.3 Test Selection and Test Execution 

From the test graph we can then extract some particular test cases, for instance 
the one pictured on figure 7. 

Transitions labelled with a indicate that the test case is output complete. 
Executing this test case may exhibit a possible incorrect behaviour of the HIT 
(figure 8), in which an occurrence of the ABORT action in state 32 leads to a Fail 
verdict (since the lUT is deadlocked in this state). 

More precisely, each time states 32, 11 or 21 are reached their respective 
counter are incremented. So during the test execution, the counter associated 
with the state 32 can overflow if an ABORT action occurred. Of course, this sce- 
nario is not guaranteed to appear since this incorrect behaviour is not fully 
controllable by the tester. 
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input(Other) 

j mput(ReqOpen) 



output(close) 



output(open) 



32 " 



Fig. 7. Test Case 



output (abort) 



input (Other) 

' input (Lock) ^ 3 

output (abort) I | input (ReqOpen) 

L2 

I output (open) 

4 



output (close) 



Fig. 8. lUT 



5 Conclusion 

In this paper, we have proposed an approach to automatically produce test cases 
allowing to check the satisfiability of a linear property on a given implementation. 
Parameterised test cases are generated from a (possibly partial) specification of 
the lUT, the (bounded liveness) property being expressed itself by a Parame- 
terised Rabin automaton. The resulting test case can then be instantiated only 
at test time, depending on the test environment considered (for instance the 
target architecture, or the actual communication structure between the tester 
and the lUT, etc.). This approach has been formally defined, and a practical 
test generation algorithm has been sketched. 

The objective of this work is to extend to other contexts or application do- 
mains the framework of conformance testing, already well established in the 
telecommunication area. We believe that a prerequisite was to make this frame- 
work more flexible, for instance allowing the use partial specifications, or allowing 
the validation of explicit properties. This is a first step in this direction. 

This work can now be extended in several directions. First we need to pro- 
totype the algorithms we have proposed to better estimate their performances, 
and possible optimisations. Then, their application on various case studies will 
certainly allow to improve the test selection strategy (possibly using TGV-like 
test purposes in combination with a property). Finally, the use of static analysis 
techniques (for instance as presented in [2]) could also certainly improve the 
efficiency of the test generation algorithm by focusing on most promising parts 
of the specification. 
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Abstract. The problem of computing Unique Input/Ouput sequences 
(UIOs) is NP-hard. Genetic algorithms (GAs) have been proven to be 
effective in providing good solutions for some NP-hard problems. In this 
work, we investigated the construction of UIOs using GAs. We defined a 
fitness function to guide the search of potential UIOs and introduce a DO 
NOT CARE character to improve the GA’s diversity. Experimental 
results suggest that, in a small system, the performance of the GA based 
approaches is no worse than that of random search while, in a more 
complex system, the GA based approaches outperform random search. 

Keywords: FSMs, UIOs, Conformance Testing, Genetic Algorithms, 
Optimisation 



1 Introduction 

Finite state machines (FSMs) have been used for modelling systems in var- 
ious areas such as sequential circuits, software and communication protocols 
[1,2,4,9,10,11,12]. Four test sequence generation methods are discussed and com- 
pared in [4], namely. Transition Tours (T-Method), Unique Input/Output Se- 
quence (U-Method), Distinguishing Sequence (D-Method), and Characterizing 
Set (W-Method). The last three methods are known as formal methods since 
they not only check the transitions, but also verify the states. In terms of the 
fault coverage, the U-, D-, and W-Methods achieve better performance than 
T-Method does, while exhibiting no significant difference among themselves [4]. 

Among the formal methods, U-Method is popular since it benefits from the 
facts: (1) Not all FSMs have a Distinguishing Sequence (DS), but nearly all 
FSMs have UIOs for each state [6]; (2) The length of a UIO is no longer than 
DS; (3) While UIOs may be longer than a characterising set, in practice UIOs 
often lead to shorter test sequences. Unfortunately, computing UIOs is NP-hard 
[2]. Lee et al. [2] note that an adaptive distinguishing sequences and UIOs may 
be produced by constructing a state splitting tree. However, no rule is explicitly 
defined to guide the construction of input sequence. Naik [5] proposes an ap- 
proach to construct UIOs by introducing a set of inference rules. Some minimal 
length UIOs are found. These are used to deduce some other states’ UIOs. A 
state’s UIO is produced by concatenating a sequence to another state, whose 
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UIO has been found, with this state’s UIO sequence. Although it reduces com- 
putational complexity, the inference rule inevitably increases a UIO’s length, 
which will consequently add more costs to the forthcoming test. 

Genetic algorithms (GAs) have proven efficient in search and optimisation 
[7] and have shown their effectiveness in providing good solutions to NP-hard 
problems such as the Travelling Salesman Problem. This work investigates the 
use of GAs for constructing UIOs from an FSM. An initial population is produced 
by randomly generating input sequences. This population is used to explore 
potential UIOs. Based on the state splitting tree, a fitness function is defined 
to evaluate the quality of the input sequences. This fitness function encourages 
candidates to split the set of all states into more discrete units and punishes 
the length of the sequences. Roulette wheel selection and uniform crossover are 
implemented. Simulation results are also presented and discussed. During the 
evolutionary computation, good solutions found are stored in a database. This 
database can be used to preserve the information lost during the computation 
and to further optimise UIOs’ length. 

This paper is organised as follows: FSMs are briefly reviewed in section 2. 
A simple GA is introduced in section 3. Experiments and corresponding results 
are described in section 4. Finally, conclusions are drawn in section 5. 

2 Preliminaries 

2.1 Finite State Machines 

An FSM M is defined as a quintuple {I,0,S,S, X) where 1,0, and S are finite 
and nonempty sets of input symbols, output symbols, and states, respectively; 
S : S X I — S is the state transition function; and X : S x I — >■ O is the output 
function. When the machine is in a current state s G S and receives an input 
a € I, it moves to the next state 6{s, a) and produces output A(s, a). 

An FSM M can be viewed as a directed graph G = (U, E), where the set of 
vertices V represents the state set S' of M and the set of edges E represents the 
transitions. An edge has label i/o where i € I and o G O are the corresponding 
transition’s input and output. Figure 1 illustrates an FSM represented by its 
corresponding directed graph. Throughout this paper where, for some state s 
and input x, no transition from s with input x is shown, this will be interpreted 
as the input of x in s leading to an error message being output and the FSM 
moving to a special error state. 

Two states and Sj are said to be equivalent if and only if for every input 
sequence a G I* the machine produces the same output sequence, A(sj,a) = 
X{sj,a). Machines Mi and M 2 are equivalent if and only for every state in M\ 
there is a corresponding state in M 2 , and vice versa. A machine is minimal 
{reduced) if and only if no two states are equivalent. It will be assumed that 
any FSM being considered is minimal since any (deterministic) FSM can be 
converted into an equivalent (deterministic) minimal FSM [3]. An FSM is com- 
pletely specified if and only if for each state s* and input a, there is a specified 
next state = <5(si,a), and a specified output Oi = X(si,a). Otherwise, the 
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Fig. 1. A Finite State Machine 



machine is partially specified. An FSM is strongly connected if, given any ordered 
pair of states (sj, Sj), there is a sequence of transition that moves the FSM from 
Si to Sj. 

It will be assumed throughout this article that an FSM is deterministic, 
minimal, completely specified, and strongly connected. 

2.2 Conformance Testing 

Given a specification FSM M, for which we have its complete transition diagram, 
and an implementation M' , for which we can only observe its I/O behaviour 
(’’black box”), we want to test to determine whether the I/O behaviour of M' 
conforms to that of M. This is called conformance testing. A test sequence 
that solves this problem is called a checking sequence. I/O behaviorial difference 
between specification and implementation can be caused by either an incorrect 
output (an output fault) or an earlier incorrect state transfer (a state transfer 
fault). The latter can be detected by adding final state check after a transition 
testing is finished. A standard test strategy is: 

1. Homing: Move M' to an initial state s; 

2. Output Check: Apply an input sequence a and compare the output sequences 
generated by M and M' separately; 

3. Tail State Verification: Using state verification techniques to check the final 
state. 

The first step is known as homing a machine to a desired initial state. The 
second step checks whether M' produces the desired output sequence. The last 
step checks whether M' is in the expected state s' = S{s,a). Three techniques 
can be used for state verification: 

— Distinguishing Sequence (DS) 

— Unique Input/Output (UIO) 

— Characterizing Set (CS) 
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Fig. 2. A state splitting tree from an FSM 



A distinguishing sequence is a sequence that produces a different output for 
each state. Not every FSM has a DS. 

A UIO sequence of state Si is an input/output sequence x/y, that may be 
observed from Si, such that the output sequence produced by the machine in 
response to x from any other state is different from y, i.e. A(si,a;) = y and 
\{si, x) yf A(sj,x) for any i ^ j. A DS defines a UIO. While not every FSM has 
a UIO for each state, some FSMs without a DS have a UIO for each state. 

A characterizing set IF is a set of input sequences with the property that, 
for every pair of state (si, sj), i yf j, there is some Wi € W such that A(si, Wi) yf 
\{sj,Wi). Thus, the output sequences produced by executing each Wi € W from 
Sj verifies Sj. This paper will focus on the problem of generating UIOs. 

2.3 State Splitting Tree 

A state splitting tree is a rooted tree T that is used to construct adaptive distin- 
guishing sequences or UIOs from an FSM. Each node in the tree has a predecessor 
(parent) and successors (children). The predecessor of the root node, which con- 
tains the set of all states, is null. The nodes containing discrete state have empty 
successor. These node are also known as terminals. A child node is connected to 
its parent node through an edge labelled with characters. The edge implies that 
the states in the child node are partitioned from states in the parent node upon 
receiving the labelled characters. The splitting tree is complete if the partition 
is a discrete partition. 

An example is illustrated in Figure 2 where an FSM (different from the one 
shown in Figure 1 ) has six states, namely, S = {si, S2, S3, S4, S5, se}. The input 
set is / = {a,b} while output set O = {x,y}. The root node is indicated by 
A^( 0 , 0 )^, containing the set of all states. Suppose states {31,33,35} produce x 
simultaneously and arrive at some states when responding to a, while |s2, S4, 35} 

^ N{i,j): i indicates that the node is in the layer from the tree, j refers to the 
node in the layer. 
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produce y. Then {51,53,55} and {52,54,56} are distinguished by a. Two new 
nodes rooted from A^(0, 0) are then generated, indicated by N{ 1 , 1) and N{ 1 , 2). 
Continuing to input FSM with b, state initially from {51} produces x while states 
initially from {53,55} produce y. Then ab distinguish {54} from {53,55}. Two 
new nodes rooted from N{ 1 , 1) are generated, denoted N{ 2 , 1) and A^(2,2). The 
same operation can be applied to {52, 54, 5e}. Repeating this process, we can get 
all discrete partitions as shown in Figure 2 (if there is no adaptive distinguishing 
sequence, this will not happen). A path from a discrete partition node to the 
root node forms a UIO to the state related to this node. When the splitting tree 
is complete, we can construct UIOs for each state. 

Unfortunately, the problem of finding data to build up the state splitting 
tree is NP-hard. This provides the motivation for investigating the use of GAs. 
In the following sections, we will discuss the problem in detail. 

3 Apply GA to FSM 

3.1 Genetic Algorithms 

A genetic algorithm (GA) is a heuristic optimisation technique that simulates 
natural processes, utilizing selection, crossover, mutation and fitness proportion- 
ate reproduction operators. Since Holland’s seminal work (1975) [8], it has been 
applied to a variety of learning and optimisation problems. 

A GA starts with a randomly generated population, each element (chromo- 
somes) being a sequence of variables/parameters. Variable values can be rep- 
resented in binary form, real-number, or even characters. The quality of each 
chromosome is determined by a fitness function that depends upon the problem 
considered. Those of high fitness have a greater probability of multiple repro- 
duction while those of low fitness have a greater probability of being rejected. 

Grossover and mutation are applied to produce new chromosomes. Grossover 
exchanges information between randomly selected parent chromosomes by ex- 
changing parameter values to form children. Single-point crossover, multi-point 
crossover and uniform crossover are three major types. Mutation injects informa- 
tion into the genetic pool by mutating randomly selected parameters according 
to the preset mutation probability. Mutation prevents genetic pool from prema- 
ture convergence, namely, getting stuck in local maxima/minima. A flow chart 
for a simple GA is presented in Figure 3. 

3.2 Solution Representation 

When applying a GA the first question to considered is what representation 
should be used. In this work, the chromosomes in the genetic pool are strings 
of characters from the input set I. To preserve more information, a DO NOT 
CARE character 'D' is also considered. We will explain the reason for using 
this character in the following sections. When receiving this input, the state of 
an FSM remains unchanged and no output is produced. Grossover operated on 
two parent chromosomes swaps characters. When a gene (character) is mutated 
according to the mutation rate, it is replaced with a character randomly selected 
from the rest in the input set, including 'D'. 
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Fig. 3. Flow Chart for a Simple GA 



3.3 Fitness Definition 

A key issue is to define a fitness function to (efficiently) evaluate the quality of 
solutions. This function should embody two aspects: (1) Solutions should create 
as many discrete units as possible. (2) The solution should be as short as possible. 
The function needs to make a trade-off between these two points. This work uses 
a function that rewards the early occurrence of discrete partitions and punishes 
the chromosome’s length. An alternative would be to model the number of state 
partitions and the length of solution as two objectives and then treat them as 
multi-object optimisation problems. 

We define a fitness function in (2) that is derived from (1). While applying 
an input sequence to an FSM, at each stage of a single input, the state splitting 
tree constructed is evaluated by equation (1), 
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where i refers to the input character. Xi denotes the number of existing 
discrete partitions while 5xi is the number of new discrete partitions caused 
by the input, yt is the number of existing separated groups while Syt is 
the number of new groups. is the length of the input sequence up to the ith 
element {Do Not Care characters are excluded), a, (3 and 7 are constants. 
From the equation it can be seen that, when Xi increases, /(q will exponentially 
increase, while, when an input sequence’s length li increases, /(q is reduced 
exponentially. Suppose Xi and h change approximately at the same rate, that is 
6 xi « Sli, as long as has faster dynamics than 1 ] , ^ will still 

increase exponentially. If no discrete partition is found with the input sequence 

length increases, the fitness function will decrease dramatically. ^ thus 

performs two actions: encouraging the early occurrence of discrete partitions and 
punishing the increment of an input sequence’s length. -will also affect 

/(q in a linear way. Comparing to ^ it plays a less important role. This 
term rewards partitioning even when discrete classes have not been produced. 

After all input characters have been examined, the final fitness value for this 
input candidate is defined as the average of (1) 

1 ^ 



where N is the sequence’s length. 

3.4 Tracking Historical Records 

Mutation prevents a GA from getting stuck in a local maxima/minima but might 
also force a GA to jump out of the global maxima/minima when it happens to 
be there. Solutions provided at the end of evolutionary computation could be 
good, but need not be the best found during the process. It is therefore useful 
to keep track of those candidates that have produced good or partially good 
solutions, and store them for the purpose to further optimise the final solutions. 

Consider an example shown in Figure 4. Suppose that a GA produces a UIO 
sequence Ut for state St, forming a path shown in thin solid arrow lines. During 
the computation, another solution [// for st has been found, forming a path 
shown in dotted arrows. The two lines visit a common node at N^. Ut has a 
shorter path than [// before N 4 while has a longer path after N 4 . The solution 
recombined from Ut and U^ (indicated in figure by thick arrow lines), taking 
their shorter parts, is better than either of them. 

In this work, a database is created to track all candidates that result in the 
occurrence of discrete partitions. This database is then used to further optimise 
the final solutions through recombination. Solutions for a state, which are of the 
same length, are multi-UIOs of this state which can be used in test generation 
[9,10]. 
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Fig. 4. Solution Recombination 



4 Experiments 

A set of experiments was devised to test the GA’s performance. The first FSM 
studied is the one shown in Figure 1. All minimum-length UIOs of all states are 
presented in Table 1. Roulette Wheel Selection (RWS) and Uniform Crossover 
(UC) are implemented. 



Table 1. UIO Sequences for Fignre 1 



State 


UIOs 


Si 


aa/xx, ab/xx, ac/xy, ba/xx 
bb/xy, ca/yx, cb/yx 


S2 


b/y 


S3 


ba/xz, bc/xz, ca/yz, cc/yz 


S4 


bb/xx, bc/xy 


S5 


a/z, c/z 



In the first experiment, the input space is / = {a,b,c}. The parameters are 
set to^: ChrLen = 10, XRate = 0.75, MRate = 0.05, PSize = 30, MGen = 50, 
a = 0.1, /3 = 1.5, and 7 = 5. 

At the end of computation, by looking at the average fitness values (Figure 5), 
we found that the genetic pool converges quite quick. The curve is comparatively 
smooth. However, by examining all individuals (Table 2), we found that the 

^ ChrLen:Chromosome Length; XRate:Crossover Rate; MRate:Mntate Rate; PSize: 
Popnlation Size; MGen:Max Generation. 
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Fig. 5. Input Space {a,b,c} 



Table 2. Final sequences from input space {a,b,c} 



ID 


Sequence 


ID 


Sequence 


1 


bccaabcacc 


16 


bbcaabcacc 


2 


bbbaabaacc 


17 


bbbaabaaca 


3 


bbbaabcacc 


18 


bbcaababcc 


4 


bcccabaacb 


19 


bbbcabbacc 


5 


bbcbabcacb 


20 


bbcaabaacc 


6 


bbbaabcacc 


21 


bccaabaacc 


7 


bbcbabcacb 


22 


bbbaabaacc 


8 


bbccabcacb 


23 


bbbcabbacc 


9 


bbcbabaacb 


24 


bbcbabcacb 


10 


bcbbabaacb 


25 


bbbcabcacc 


11 


bbcbabcabb 


26 


bbcaabcacb 


12 


bbcaabaacc 


27 


bbbbababcc 


13 


bccaabaacb 


28 


bbcaabcacb 


14 


bbbbabcacc 


29 


bbccabbacc 


15 


bbcbabcacb 


30 


bbbaabbacb 



whole population tends to move to the individuals that start with bb or be. The 
population loses its diversity and converges prematurely. Consequently, only a 
few UIOs have been found {b, bb, be}. 

This is not what we expected. To keep the genetic pool diverse, we introduced 
a DO NOT CARE character 'jj'. When receiving this character, the state of 
an FSM remains unchanged. The input space is then {a, 6, c, jl}. We keep the 
same values for all other parameters. The average fitness chart is presented in 
Figure 6. It can be seen that Figure 6 is not as smooth as Figure 5, but still 
shows a general tendency to increase. After examining the genetic pool, we found 
that eleven UIOs, {a,b,c,ab,ac,ba,bb,bc,ca,cb,cc}, were found (Table 3^). By 
retrieving the historical records, we also found {aa} (Table 4). The GA thus 
performs well in this experiment. 

The reason that crossover operator can be used to explore information is that 
it forces genes to move among chromosomes. Through recombination of genes, 
unknown information can be uncovered by new chromosomes. However, the gene 
movement exerted by crossover operator can only happen among different chro- 

Sequence: candidate sequence. VS: minimum-length UIO. 
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Fig. 6. Input Space {a,b,c,#} 



Table 3. Final sequences from input space {a,b,c,tt} 



ID 


Sequence 


VS 


ID 


Sequence 


VS 


1 


bbbbccch'^c 


bb 


16 


^bcbabcabb 


be 


2 


ab'ibbc'ibbc 


ab 


17 


tt#c6at|c6cb 


eb 


3 


‘^bd^caaaba 


be 


18 


t|66ttttbfeaca 


bb 


4 


^hcachc^c 


be 


19 


bcttb&Jcfefett 


be 


5 


b'^bcaca^cb 


bb 


20 


cbcbbc^^ca 


eb 


6 


feb|j6catttt66 


bb 


21 


tt6c#aa6cttc 


be 


7 


cbacd^ad^b 


eb 


22 


cttattcacc#c 


ea 


8 


acabaaaacc 


ae 


23 


cttaccacjltta 


ea 


9 


battattaaattfc 


ba 


24 


ttbttfecafefecfj 


bb 


10 


ttccaa&ttocc 


ee 


25 


bfettb&attttatl 


bb 


11 


ttc&c(jaa#a6 


eb 


26 


bac6ttcji&ttfe 


ba 


12 


ttb&cmfjaao 


bb 


27 


bbmm 


bb 


13 


ccc6tltt###c 


ee 


28 


c6(jba6ctt6tt 


eb 


14 


aca'^bbaa^b 


ae 


29 


ccccjlccttfeb 


ee 


15 


cb6ttcjlcb|jc 


eb 


30 




be 



mosomes. We call it vertical movement. By using a DO NOT CARE character, 
some spaces can be added in a chromosome, which makes it possible for genes 
to move horizontally. Therefore, DO NOT CARE makes the exploration more 
flexible, and, consequently, can help to keep the genetic pool diverse. 

We organised eleven experiments with the same parameters. By examining 
the solutions obtained in the final genetic pool (historical records are excluded), 
we evaluated the average performance. Table 5 shows that, in the worst case, 8 
out of 12 UIOs are found, which accounts for 66.7%. The best case is 100%. The 
average is 86.4%. 

After examining the solutions from different experiments, we found that aa 
is the hardest UIO to be found while bb and be are most frequent ones that occur 
in the final solutions. By checking Table 1., we found a very interesting fact: a 
majority of UIOs initially start with b or c. If individuals happen to be initialised 
with baxxxxxxxx, they will distinguish si, S2 and S5 in the first two steps, 
and so achieve high fitness. These individuals are likely to be selected for the 
next generation. Individuals initialised with aaxxxxxxxx can distinguish 
only Si and S5 in the first two steps, and achieve lower fitness values. They are 
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Table 4. Historical Records 



ID 


Sequence 


VS 


Fitness 


1 


aattcaa&ttc 


aa 


6.2784 


2 


at|a6at|&cttc 


aa 


4.2605 



Table 5. Different Experiment Results 



Exp. 


UIOs Found 


Total 


Percent (%) 


1 


8 


12 


66.7 


2 


8 


12 


66.7 


3 


11 


12 


91.7 


4 


11 


12 


91.7 


5 


10 


12 


83.3 


6 


11 


12 


91.7 


7 


11 


12 


91.7 


8 


11 


12 


91.7 


9 


10 


12 


83.3 


10 


12 


12 


100 


11 


11 


12 


91.7 


Avg 


10.36 


12 


86.4 



less likely to be selected for reproduction. This fact seems to imply that there 
exist multiple modals in the search space. Most individuals are likely to crowd 
on the highest peak. Only a very few individuals switch to the lower modals. To 
overcome this problem, sharing techniques might help. The application of such 
approaches will be left for future work. 

We then turn to compare the performance between GA and random search 
(RS). RS is defined as randomly perturbing one bit in an input sequence. 30 
input sequences of ten input characters were randomly generated. We repeated 
this experiment several times. The results shown in Table 6 are the best ones. 
From the table it can be seen that 11 out of 12 UIOs (a, b, c, aa, ab, ac, bb, be, 
ca, cb, cc) are found over these experiments. Only one is missed (6o). 

Since the FSM is comparatively simple, and the UIOs are short, it is not 
difficult to find all UIOs through RS. Thus the GA does not show significant 
advantages over RS. A more complicated system, shown in Figure 7, is therefore 
designed to further test GA’s performance. Unfortunately, no existing UIOs are 
available, which means that we can never be sure that a complete set of UIOs 
has been found. Hence, we will compare the numbers of UIOs found by using 
RS and the GA separately. 

A total of 50 candidates were used in the experiment. All UIOs found, 
whether minimum- length or not, are listed to make a comparison. Experiments 
on both RS and the GA were repeated several times. The solutions presented in 
Table 7 and Table 8 are the best ones. Table 7 lists the UIOs obtained through 
RS while Table 8 shows the solutions found by GA. After comparing these two 
tables, we find that the GA finds many more UIOs than RS does. Both RS 
and GA easily find the short UIOs. However, for other UIOs the performance 
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Table 6. Solutions by random search 



ID 


Sequence 


VS 


ID 


Sequence 


VS 


1 


cabbbcbcac 


ca 


16 


cacacacccc 


ca 


2 


ccbcbbbcbc 


cc 


17 


bcbbabcbca 


be 


3 


bcccaaabcb 


be 


18 


ccccbbcccb 


cc 


4 


cccccbcbac 


cc 


19 


accbbccacb 


ac 


5 


ccbbacbccc 


cc 


20 


cbccaccccb 


cb 


6 


bccccabcac 


be 


21 


acbbcbbbbb 


ac 


7 


bccbcbbcbc 


be 


22 


cbccccbccb 


cb 


8 


ccbbccaabc 


cc 


23 


ccabacbcca 


cc 


9 


bbacccccba 


bb 


24 


accacbabcc 


ac 


10 


cbbbbacccb 


cb 


25 


aabcbacbbb 


aa 


11 


bcaacccccb 


be 


26 


cabbacacbc 


ca 


12 


cccbcbbcbc 


cc 


27 


bcbccccaac 


be 


13 


abcbbcccab 


ab 


28 


aabcbccbca 


aa 


14 


bcccccbccc 


be 


29 


bccccaabbb 


be 


15 


bbccbccbba 


bb 


30 


acacebaaba 


ac 



a/z 




Fig. 7. A more complicated FSM 



of GA appears to be much better than that of RS. For example, the GA finds 
hchccj xxzzz cbcbb/xzzyz while RS does not. 

We also measure the frequency of hitting UIOs. RS is redefined by initialising 
population routinely. Experimental result show that both methods hit UIOs with 
the length of 3 or less frequently. However, on hitting those with the length of 4, 
RS is roughly the half times of GA, while, for those with the length of 5, in the 
first 30 iterations, RS hits 10 times while GA 27. All these results suggest that, 
in simple systems, it is possible to obtain good solutions through random search. 
However, in more complicated systems, especially in those with large input and 
state spaces, finding UIOs with random search is likely to be infeasible. By 
contrast, GA seems to be more flexible. In future work, we will apply GA to 
some examples with high input dimensions, or some real applications such as 
the IEEE 802.2 Logical Level Control (LLC) Protocol (48 inputs & 65 outputs). 
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Table 7 . UIO Sequences By Random Search 



State 


UIOs 


Si 


ca/xx, cb/xx 


S2 


aa/xz, ab/xy, acc/xxx, bb/xy 
bcc/xxx, bcba/xxzy 


S3 


a/z, bb/yz, ca/xz, cb/xy, 
ccb/xxy, ccc/xxx 


S4 


bc/yx, cba/xzy, ccb/xxz 
cbca/xzzx, cbcc/xzzz 


S5 


ab/xz, acb/xxz, bb/yx 
bcc/yzx, cacc/zxxx, cb/zx 


•S6 


bb/zx, bcc/zzx 


S7 


a/y, bb/yy, bcc/yzz, 
bcbc/yzyz, cbc/zyz, cc/zz 


•S8 


bb/zy, bcc/zzz, bcbc/zzyz, 
cbca/xzzz, cbcc/xzzx 


S9 


bb/xz, ca/zz, cc/zx 



Table 8. UIO Sequences Found by GA 



State 


UIOs 


Si 


ca/xx, cb/xx 


S2 


aa/xz, ab/xy, aca/xxz,acb/xxy 
acc/xxx, bb/xy, bcc/xxx, bcbcc/xxzzz 


S3 


a/z, bb/yz, ca/xz, cb/xy, 
ccb/xxy, ccc/xxx 


S4 


cbcbb/zxxyx, bc/yx, cba/xzy, 
cbb/xzy, ccb/xxz, cbca/xzzx, 
cbcc/xzzz, cbcbc/xzzyz 


S5 


ab/xz, acb/xxz, bb/yx, bca/yzz 
bcc/yzx, cab/zxy, cb/zx 


S6 


bb/zx, bca/zzz, bcc/zzx 


S7 


a/y, ba/yy, bb/yy, bca/yzx 
bcc/yzz, cab/zxz, cbb/zyx, cc/zz 
cbc/zyz, cc/zz, bcbc/yzyz 


•58 


ba/zy, bb/zy, bca/zzx, 
bcc/zzz, cbb/xzx, cbca/xzzz, 
cbcc/xzzx, cbcbb/xzzyz 


S9 


bb/xz, ca/zz, cbb/zyz, cc/zx 



5 Conclusion 

In this paper, we investigated GA’s performance in computing UIOs from an 
FSM. We showed that the fitness function can guide the candidates to explore 
potential UIOs by encouraging the early occurrence of discrete partitions while 
punishing length. We showed that using DO NOT CARE character can help 
to improve the diversity in GAs. Gonsequently, more UIOs can be explored. 
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The simulation results in a small system showed that, in the worst case, 67% of 
the minimum-length UIOs have been found while, in the best case, 100%. On 
the average, more than 85% minimum-length UIOs were found from the model 
under the test. In a more complicated system, GA found many more UIOs than 
random search. The GAs was much better, than random search, at finding the 
longer UIOs. These experiments and figures suggest that GAs can provide good 
solutions on computing UIOs. 

We also found that some UIOs were missed with high probability. This may 
be caused by their lower probability distribution in the search space. Future 
work will consider this problem. 
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Abstract. In this paper, we present an algorithm for generating test purpose de- 
scriptions in form of MSC’s from a given labeled event structure that represents 
the behavior of a system of asynchronously communicating extended finite 
state machines. The labeled event structure is a non-interleaving behavior 
model describing the behavior of a system in terms of the partial ordering of 
events. 



1 Introduction 

For testing whether the behavior of an implementation conforms to its designated 
behavior, test cases are to be generated from the specification describing the desig- 
nated behavior. The behavior of a distributed system can be specified e.g. using a sys- 
tem of asynchronously communicating state machines. This model forms the basis 
e.g. of the standardized formal description technique SDL [1]. A system of communi- 
cating state machines implicitly describes all, possibly non-deterministic, sequences 
of inputs and outputs that constitute the designated behavior. Since the number and 
length of these sequences are infinite in general, it is impossible to test each and 
every possible behavior and we face the problem to select a set of meaningful test 
cases, i.e. a test suite, that allows to discover as many implementation errors as pos- 
sible at an acceptable cost. This forms the main problem in generating conformance 
test suites. 

Each test case in a test suite specifies the actions required to achieve a specific test 
purpose. The test purpose in each case is to check a particular requirement implied by 
the given specification [2]. A test purpose can be expressed e.g. by prose text or by a 
message sequence chart (MSC) describing the behavior to be checked. MSC’s are a 
standardized description technique for the graphical representation of the temporal 
ordering of interactions between components of a distributed system [3]. 

The existing methods for test generation from formal specifications can be roughly 
classified into methods with explicit test purposes and methods with implicit test pur- 
poses. Methods with explicit test purposes require information about the test purposes 
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as input in addition to the specification. These methods offer much flexibility to the 
test designer and ensure that only executable test cases are generated. However, they 
require considerable manual efforts to define appropriate test purposes and do not 
guarantee systematic test coverage. Methods with implicit test purposes provide test 
cases for test purposes that they tacitly assume. These methods generally guarantee a 
complete test coverage w.r.t. the implicit test purposes. However, most of them are 
applicable only to restricted classes of specifications, e.g. to specifications containing 
a single state machine, and they may result in very large test suites. 

Since practically relevant system specifications may be voluminous and compli- 
cated, a manual generation and maintenance of test purposes and test cases is too 
time-consuming and error-prone. It is therefore highly desirable to have test genera- 
tion methods with implicit test purposes or at least methods for the automatic genera- 
tion of test purposes. Only few test generation tools, like Autolink [4] in the Telelogic 
Tau toolset and TestComposer [5] in the Telelogic ObjectGeode toolset are applicable 
to complex multi-process SDL specifications of a realistic size. These two tools are 
based on interleaving models for the behavior of the specified system. This entails 
that the same behavior may be represented by different paths of the reachability 
graph, which differ only in the order of execution of concurrent actions. 

Our approach uses a non-interleaving model (labeled event structure) to alleviate 
the state-space explosion problem. In [6], an algorithm for transforming a system of 
asynchronously communicating state machines into a labeled event structure is given 
and a method with implicit test purposes for generating test cases in Concurrent 
TTCN from a labeled event structure is proposed. To combine the advantages of 
methods with implicit test purposes with those of methods with explicit test purposes, 
this paper aims at the automatic generation of test purposes from labeled event struc- 
tures. From a labeled event structure, test purpose descriptions are generated in form 
of MSC’s by interpreting the parallel paths of the labeled event structure as MSC’s. 
These MSC’s can serve as input for test generation tools with explicit test purposes, 
preferably if those support test generation for distributed testers, as proposed in [7, 8]. 

The rest of this paper is organized as follows. Section 2 introduces the prerequi- 
sites necessary for the proposed approach. Section 3 deals with the generation of test 
purposes from a labeled event structure. Throughout the paper, a simple sliding- 
window protocol serves as an example. Section 4 gives a summary and outlook. 



2 Preliminaries 

2.1 Communicating State Machines 

A system of asynchronously communicating state machines is an obvious semantic 
model for specifications in SDL. Therefore, they form the starting point for our ap- 
proach. 

A system of asynchronously communicating state machines is composed of a set of 
state machines and a set of perfect (i.e. without loss or reordering of messages) FIFO 
queues that connect the state machines with each other and with their environment. 
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We consider each state machine as an extended finite state machine (EFSM) with- 
out enabling conditions for transitions. In general, an EFSM is a finite state machine 
extended by additional variables that may be used in enabling conditions for transi- 
tions, in calculations to be carried out during the execution of transitions, or for repre- 
senting message parameters. An EFSM with enabling conditions can be transformed 
into an equivalent one without enabling conditions if the variables influencing the 
executability of transitions take on only a finite number of discrete values. An algo- 
rithm for this transformation is given in [9, 6]. This condition is not unduly restricting 
the class of specifications for which the algorithm for generating test purposes is 
applicable since it is a common practice for a test designer to determine the context 
by assigning values to control variables and to parameters of input messages. 

We do not require that the EFSM’s form a closed system, but allow open inter- 
faces to the environment. To limit the complexity imposed by the environment, the 
following assumption is made. The environment is assumed to put a message into a 
queue if and only if the associated EFSM is ready to consume it. Hence, a transition 
with a trigger input (excited by a message from the environment) is assumed always 
to be enabled as soon as the EFSM reaches the start state of that transition. This as- 
sumption is common practice in test generation for conformance testing, which is, in 
contrast to robustness testing, confined to the behavior foreseen in the specification. 

Let m = (M ,0^ be a system of asynchronously communicating EFSM’s composed 
of a set of EFSM’s M = and a set of message queues Q = {<7[,...,g^}. 

A global state of m is an {n + r)-tuple g = , . . . , ] consisting of the 

states s^^,...,s^ of the EFSM’s and the contents c^^,...,c^ of the queues 

qi,...,qr ■ 

Fig. 1 shows a system of asynchronously communicating EFSM’s modeling a sim- 
ple sliding-window protocol. The EFSM’s t, r, and m model the transmitter and re- 
ceiver protocol entities and the transmission medium, respectively. To facilitate de- 




Fig. 1. Example of a system of asynchronously communicating EFSM’s 
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nominating the location of actions, we denote input and output actions in the form 
loc (“?” I “!”) rem msg {“C par par} “)”] where loc denotes the EFSM where 
the action is located, “?” indicates an input action (receiving msg), “!” indicates an 
output action (sending msg), rem is the name of the remote EFSM sending msg (in 
case of an input action) or receiving msg (in case of an output action), msg is a mes- 
sage, and par is a message parameter. stands for the environment. 

The example protocol provides the service to transmit data from a user on the 
transmitter side to a user on the receiver side while protecting the receiver against 
overload by attending to acknowledgements. If the number of messages for which the 
acknowledgement is outstanding (the window size) reaches its maximum (2 for sim- 
plicity), the protocol entity on the transmitter side indicates to the user that no more 
messages can be transmitted for the time being. When the protocol entity on the 
transmitter side receives an acknowledgement, then the number of messages for 
which the acknowledgement is outstanding is decremented and new messages can be 
transmitted again. The transmission medium is reliable and does not lose, corrupt, 
add, or reorder messages. 



2.2 Labeled Event Structures 

Definitions. For generating test purposes, we would like to have a model that explic- 
itly describes the behavior of a distributed system in terms of the order of events. A 
labeled event structure fulfils this requirement. A system of asynchronously commu- 
nicating EFSM’s can be “unfolded” into a labeled event structure. In a labeled event 
structure concurrent events are not linearized as in a reachability tree, but lined up 
side by side without order relation. Event structures were introduced in [10] as being 
like acyclic Petri nets without backward branching and with the places removed. 

A basic element of labeled event structures are actions. The same action can occur 
various times in a system run, each time forming a new, distinguishable event. The 
actions in the labeled event structures correspond to actions in the underlying systems 
of asynchronously communicating EFSM’s: they model the inputs and outputs, calcu- 
lations in the context variables, and the setting, resetting and expiration of timers. 

A labeled event structure over a set of actions A is a quadruple Ie,<,#,1'^ where 

• £ is a finite set of events; 

• Ac£x£isa partial order relation in E , called causality relation, such that for 
all ee E the set {c'g £|e'^} is finite (i.e., the number of causal predecessors of 
any event is finite); 

• #c ExE is an irreflexive and symmetric relation in E , called conflict relation, 
such that ye,e',e"& E(l,e#e' Ae'fe")^ e#e") (i.e., conflicts are inherited: if an 
event e is in conflict to some event e' , then it is also in conflict to all causal suc- 
cessors of e'); 

• /:£’—» A is a labeling function assigning an action to each event. 

efe means that if the events e and e both happen, then e must happen before 
e' . e#e' means that the events e and e' cannot happen both in a single run of the 
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system. If two events are neither causally related nor in conflict, then they are concur- 
rent to each other and hoth can occur in any order: either e before e ' , e and e' at 
the same time or e' before e . All events occurring in the same EFSM are either 
causally related or in conflict, but not concurrent to each other. 

A labeled event structure is interpreted informally as follows: An event can occur 
if all its causal predecessors have occurred and no conflicting event has occurred yet. 

Let nii^^ = be a labeled event structure and C ^ E be a subset of 

events of nii^^ . C is causally closed if VeG C\/e' e. E(e'^ e's C). C is conflict- 
free if \/e,e'e C(—i(e#e')). C is a configuration of if it is causally closed and 
conflict-free. That means, a configuration is a set of events that have occurred by 
some stage in executing a labeled event structure. The necessary configuration [e] of 
an event ee E of a labeled event structure is the subset of events that includes 
e and all causal predecessors of e, but not any other events, i.e. [e]= All 

events that have to occur prior to an event e belong to the necessary configuration of 
e. Events that are concurrent to e do not belong to the necessary configuration of e. 

Each configuration of a labeled event structure constructed from a system of asyn- 
chronously communicating EFSM’s corresponds to a global state of the system. The 
final state gs{c) of a configuration of a labeled event structure constructed from a 
system of asynchronously communicating EFSM’s m is the global state of m reached 
after all events ee C , but no other events have occurred. 

The construction of a labeled event structure from a system of asynchronously 
communicating EFSM’s can be cut off at different points, leading to different event 
structures. The labeled event structure obtained by unfolding a system of asynchro- 
nously communicating EFSM’s as much as possible is referred to as the labeled event 
structure of the system. Only a complete prefix of the labeled event structure of a 
system of asynchronously communicating EFSM’s is constructed in our approach. A 
prefix of the labeled event structure (E,<,#,1) is a labeled event structure 
Ie' induced by a causally closed subset of events E' ^ E . A prefix of the 

labeled event structure of a system of asynchronously communicating EFSM’s is 
complete if it contains a configuration C for each reachable global state g of the 
system such that 

. g = g^(c),i.e., g is represented by C , and 

• for each transition g — > g' enabled in g with co = , the prefix con- 

tains a configuration C'= C u{e, gj, ..., } with e,e[,...,e^ g C and l(e)= ju , 

/(ei)=Vi, ..., l{ep)=Vp. 

A maximal configuration is a configuration to which no more events of the com- 
plete prefix of the labeled event structure can be added. An event e is a maximal event 
of a configuration C if there does not exist any e' e C with efe . 

Graphical Representation. A labeled event structure is represented as a graph where 
vertices represent events, directed edges lead to the immediate causal successors of an 
event, and undirected dashed edges connect events in immediate conflict. Next to an 
event e its label 1(e) is indicated. The graph of a labeled event structure is cycle-free. 
The set of events occurring in the same EFSM induces a subgraph that is a directed 
tree. We draw the subgraphs for the parallel EFSM’s with their edges in parallel. 
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Fig. 2 shows a complete prefix of the labeled event structure of the system of 
asynchronously communicating EFSM’s in Fig. 1. The complete prefix is annotated 
with the global states at cut-off points and at recursion points. The prefix may be 
expanded by appending the sub-structures starting with the corresponding global 
states to the cut-off points. 

Construction of a Labeled Event Structure. The algorithm for unfolding systems 
of asynchronously communicating state machines into labeled event structures re- 
sembles the reduced reachability analysis from [11, 12], yet the results are taken 
down in the form of event structures. These reduced reachability algorithms aim at 
alleviating the state explosion problem and yield reduced reachability trees whose 
nodes represent only certain reachable global states and whose directed edges repre- 
sent sets of transitions concurrently executable in a certain global state. Intermediate 
global states reached while executing a set of concurrent transitions are not explicitly 
represented. 

For finding cut-off points suitable for a complete prefix of the labeled event struc- 
ture, [6] takes up an approach for coping with the state explosion problem in analyz- 
ing Petri nets with finite state space [13, 14]. The main idea can be outlined as fol- 




Fig. 2. A complete prefix of the labeled event structure for the example from Fig. 1 
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lows: An event is a cut-off event if its necessary configuration has the same final state 
as the necessary configuration of another event already contained in the unfolding. 
The unfolding can be cut off after these events since all events appended after the cut- 
off events would lead to states already covered by the prefix. [13] presents an algo- 
rithm for constructing a finite prefix of the unfolding of a Petri net. The prefix is 
complete with respect to the reachable markings of the Petri net. As the complete 
prefix of the unfolding constructed after [13] is sometimes larger than necessary, [14] 
improves the algorithm such that a complete prefix is constructed that is minimal in a 
certain sense. The algorithm in [14] is applicable to n-safe Petri nets with « > 1 . 

How a testing equivalent labeled event structure or its complete prefix can be con- 
structed from a given system of asynchronously communicating state machines is 
treated in detail in [6]. The approach is applicable if all state machines of the system 
have a finite number of states and all queues of the system are bounded. This is not an 
undue restriction as in many cases an unbounded growth of the state space can be 
avoided by appropriate design criteria. 



3 Test Generation Approach 

3.1 Starting Point 

Starting point for the generation of test purposes is a complete prefix of the 
labeled event structure constructed from a system of asynchronously communicating 
EFSM’s m. It forms a semantic model of a given specification of the implementation 
under test (lUT) embedded in a test context and hence models the behavior perceiv- 
able at the system boundaries during black box testing. The events that involve an 
interaction with the environment represent events occurring at points of control and 
observation (PCO’s), i.e. at points where a test system may interact with lUT and test 
context. 

As illustrated in Fig. 2, cut-off points and recursion points of are labeled with 
the corresponding global states of the system m in order to characterize the possible 
continuations of the behavior. 

3.2 Implicit Test Purposes and Test Coverage 

As each maximal configuration of a complete prefix of the labeled event structure 
represents a significant behavior, it is desirable that a test suite tries to execute each 
maximal configuration of the complete prefix at least once. We also regard it as suffi- 
cient to execute each maximal configuration of the complete prefix once. This limits 
the size of the test suite. At the cut-off points of the complete prefix, behavior that has 
been encountered before is repeated anyway. By generating a larger test suite cover- 
ing more than the complete prefix, one attains a higher test coverage and a higher 
degree of confidence that the lUT will operate free of error when actions are executed 
repeatedly. In principle, if the lUT is regarded as a black box, it remains uncertain 
whether or not it will operate free of error when the same actions are executed next 
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time. Based on knowledge about the inner structure of an implementation (e.g. about 
the reliability of the operating system, about the programming language used, etc.), 
however, often it is inferred that an implementation will work free of error any num- 
ber of times if it does so at least once. 

For each maximal configuration of the complete prefix of the labeled event struc- 
ture of a system of asynchronously communicating EFSM’s a test case is to be gener- 
ated. Its test purpose is to check the behavior described by the corresponding maxi- 
mal configuration. 

By covering each maximal configuration of the complete prefix, we achieve all- 
nodes coverage (or all-events coverage) w.r.t. the complete prefix. We do not neces- 
sarily achieve all-transition coverage w.r.t. the underlying system of asynchronously 
communicating EFSM’s due to the fact that the EFSM’s may contain transitions that 
are never triggered in normal interaction with the other EFSM’s of the system. 

3.3 Algorithm for Generating Test Purposes 

Overview. The goal is to construct a set of test purpose descriptions in form of 
MSC’s from the complete prefix of the labeled event structure of a system of 
asynchronously communicating EFSM’s. The generation of test purposes is carried 
out in the following steps, which are implemented as a prototype tool [15]: 

1. Identify all maximal configurations of the complete prefix; 

2. Restrict the maximal configurations to events occurring at the PCO’s; 

3. For each restricted maximal configuration, check whether it is included in another 
maximal configuration, and if so, eliminate it from the set of maximal configura- 
tions; 

4. Format the maximal configurations as MSC’s. 

Identification of Maximal Configurations. In order to obtain the set of events be- 
longing to a maximal configuration, we start from the cut-off points and follow the 
causality relation backwards to the roots. First, all the maximal events at a cut-off 
point are put into an initially empty event queue and into an initially empty event set. 
Loop while the event queue is not empty, get the first event from the queue and put 
all its predecessors that have not been put into the event set yet into the event queue 
and into the event set. When the loop terminates, all the events belonging to the 
maximal configuration have been put into the event set. After a maximal configura- 
tion is obtained, it is added to the set of maximal configurations. 

The identification of all maximal configurations is described in pseudo-code be- 
low. mconf. denotes a maximal configuration from the set of all maximal configura- 
tions MCONF. cutoffs denotes the set of maximal events at a cut-off point. 

.CUTOFF denotes the set of all cut-off points of . pred_queue is the queue 
data structure for processing the predecessor events. 

MCONF := 0; 

for all cutoff rrii^^ .CUTOFF do 
mconf. := (0, 0, 0, 0^ ; 
pred_queue := 0; 
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for all e.e cutojf. do 

mconf..E := mconf..E u e:, 
put(pred_queue, 

endfor; 

while not empty(pred_queue) do 
ev := get(pred_queue); 
for all ev. predecessors do 

if e.i mconf..E then 

mconf..E := mconf..E^ e:, 
put(pred_queue, e ); 
endif; 
endfor; 
end while; 

MCONF ;= MCONFvjmconf;, 

endfor; 

Restriction to Events at PCO’s. The restriction has to be done because only the 
events occurring at the system boundaries can be controlled or observed during black 
box testing. 

The process of restricting a maximal configuration to events occurring at the 
PCO’s consists of checking all events in the maximal configuration and omitting the 
events for which the remote communication partner is not the environment. In re- 
stricting the maximal configurations, the transitivity of the causality relation has to be 
preserved. 

Below, the restriction to events occurring at PCO’s is described in pseudo-code, 
for all mconf.e MCONF do 
for all e G mconf..E do 
if (l(ej).rem ^ “*”) then 

mconf..E := mconf..E\ [e.}; 
endif; 
endfor; 
endfor; 

Inclusion Checking. In order to get a minimal set of maximal configurations, each 
configuration is checked, after restricting it to the events occurring at the PCO’s, 
whether it is included in another configuration in the obtained set of restricted maxi- 
mal configurations. If so, it is removed from the set. 

Below is pseudo-code for the inclusion checking, 
for all mconf.e MCONF do 

for all mconf.e MCONF (i f j) do 
if mconf.E c mconf.E then 
MCONF := MCONF \ mconf; 
endif; 
endfor; 
endfor; 
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Formatting Maximal Configurations as MSC’s. The test purpose descriptions can 
be laid out as process-level MSC’s or as system-level MSC’s. 

A maximal configuration of a complete prefix of the labeled event structure of a 
system of asynchronously communicating EFSM’s can be straightforwardly inter- 
preted as a process-level MSC with one instance for every EFSM associated with a 
PCO and one instance for every PCO. This way, the concurrency of different EFSM’s 
remains unhidden. Fig. 3 shows a test purpose description for the example in Fig. 1 in 
form of a process-level MSC. The interfaces to the environment on transmitter and 
receiver side are referred to as PCO^ and PCOj., respectively. 

On the other hand, test purpose descriptions for Autolink are stored as system- 
level MSC’s containing only one instance for the whole system under test and one 
instance for every PCO [4]. This way, the concurrency of different components of the 
system is hidden. To make the output of our tool applicable as input to Autolink, our 
tool also generates system-level MSC’s. 

To generate system-level MSC’s, we have to linearize the maximal configurations. 
A linearization of a partially ordered event set is a total order on this event set that 
contains the partial order. A linearization can be derived from a configuration by add- 
ing arbitrary ordering constraints to the partial order of the configuration. 

In order to get a linearization for a maximal configuration restricted to the PCO’s, 
first, all the events in the maximal configuration are put into an initially empty event 
queue. Foop while the event queue is not empty, get the first event from the queue, 
check whether all its predecessor events are already included in the linearization. If 
so, add the event to the linearization. The first event added to the linearization will be 
an initial event without any predecessor. If not yet all predecessor events are in the 
linearization, put the event again into the event queue. 

The linearization of maximal configurations is described in pseudo-code below. 
mconf^.seq denotes the linearization of a maximal configuration. e_queue is the queue 
data structure for processing the events. 

for all mconf.e MCONF do 
mconf..seq := 0; 
e_queue := 0; 
for all e e mconf^.E do 
put{e_queue, e.); 
endfor; 

while not empty{e_queue) do 
ev := get(e_queue); 

if {ev. predecessors c mconf..seq) then 

mconf..seq := concatenate{mconf..seq, ev); 
else put(e_queue, ev); 

endif; 
end while; 
endfor; 

Fig. 4 shows a test purpose description for the example in Fig. 1 in form of a sys- 
tem-level MSC. 
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3.4 Data Flow Aspects 

The complete prefix of the labeled event structure constructed from a set of asynchro- 
nously communicating EFSM’s without enabling conditions for transitions may con- 
tain variables that are used for representing message parameters, for buffering values, 
or for calculations to be carried out during the execution of transitions. It does not 
contain enabling conditions for the occurrence of events. Therefore, the occurrence of 
each configuration in the complete prefix is feasible. 

Some data flow oriented test selection criteria that have been introduced for speci- 
fications represented by directed graphs can be transferred to labeled event structures. 
These criteria establish associations between definitions and uses of variables. Such 
associations are identified by tracking variables through the specification, following 
them as they are modified, until they are ultimately used in outputs or to compute val- 
ues for other variables. The criteria require that each of these associations be exam- 
ined at least once during testing. The intuition behind the selection of tests based on 
the coverage of data flow associations is that faults in a system may lead to incorrect 
values and, as a result of propagation through computations, an error may show up at 
the system’s output. 

The all-uses coverage criterion is satisfied w.r.t. the complete prefix of the labeled 
event structure if for each variable defined in the complete prefix each subsequent use 
of that variable (i.e., each def-use pair) is covered by at least one test. Even if there 
are no definitions without subsequent use within the underlying system of asynchro- 
nously communicating EFSM’s, not necessarily all variables defined within the com- 
plete prefix of the labeled event structure are used within the complete prefix. To 
achieve full all-uses coverage, our tool appends sub-structures of the complete prefix 
to the cut-off points whenever necessary and possible for covering definitions without 
use within the complete prefix. 



4 Summary and Outlook 

The approach introduced in this paper generates test purpose descriptions in form of 
MSC’s from a non-interleaving model, viz. from a complete prefix of the labeled 
event structure constructed from a system of asynchronously communicating 
EFSM’s. 

This model alleviates the state-explosion problem and preserves true concurrency. 
The size of the resulting test suite is restricted in a suitable way. The approach is 
applicable to a large class of specifications. The executability of the test cases is en- 
sured. 

A prototype tool implementing the approach described in this paper is available 
[15]. Its input is generated by the prototype tool for constructing a complete prefix of 
the labeled event structure from a generalized model of asynchronously communi- 
cating state machines [6]. Together with the corresponding system specification, the 
output of the test purpose tool is intended as input for test generation tools that take 
explicit descriptions of test purposes as input. 
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Fig. 4. A test purpose description as system-level MSC 
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As an alternative to the textual and tabular presentation formats, the new version of 
TTCN (TTCN-3) [16] allows describing tests in a graphical presentation format 
based on a subset of MSC’s. The MSC’s generated by our tool describe only the 
desired behavior to be checked in a test case. Therefore, the generated MSC’s are 
used as test purpose descriptions. MSC’s for defining test cases have to describe the 
behavior of the test components interacting with lUT and test context via the PCO’s 
and to cover possible behavior alternatives, which would lead to inconclusive or fail 
verdicts. The verdicts have to be included in a test case as well. The direct generation 
of MSC test cases from a labeled event structure is an area of future work. 
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Abstract. An embedded system is a combination of hardware and software 
subsystems. Interaction between these two subsystems may lead to unexpected 
behavior when faults are present in either. An effective technique is required to 
detect the presence of such “interaction faults” in an embedded system. We 
propose a test data selection technique for interaction testing in the embedded 
system using hardware fault injection and mutation test criteria. The proposed 
technique simulates hardware faults as software faults and uses these to mutate 
the software component. The mutants so created are then used as a means to se- 
lect test data that differentiates the original program from the mutants. An ex- 
perimental evaluation of the proposed technique is also presented. 



1 Introduction 

An embedded system [1] is comprised of hardware and software components. Exam- 
ples include nuclear power plant system, medical devices, electric home appliances, 
and practically most devices in common use. The importance of testing embedded 
systems has grown with the rapid increase in their complexity. Especially, for safety- 
critical embedded systems such as nuclear power plants and medical devices, testing 
the entire system involves high cost and risk. Even when there no faults are detected 
when testing either hardware or software alone, the combined system combining can 
lead to an unexpected situation that requires using testing technique directed at the 
detection of interaction faults. 

We propose a test data selection technique as an application of the well-known mu- 
tation based test data selection techniques [2,3] to detect faults due to the interaction 
between the hardware and software components. Program mutation is a technique to 
select input test data that differentiates the original program from its mutants. In the 
proposed technique, we first identify parts of hardware, software and their interaction 
from the system requirements. Next, a simulator (a program) is prepared to simulate 
the behavior of the embedded system. Hardware faults are simulated as software 
faults in the simulator. These software faults are then injected into the simulator to 
generate the mutated programs. The input data, which differentiates the simulator 
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program from its mutant, is selected as the test data that we expect to be capable of 
detecting the interaction faults in the embedded system. We apply our technique to a 
safety-critical embedded system, namely DPPS (Digital Plant Protection System) [4] 
in order to select test data to detect interaction faults and to show the effectiveness of 
the test data selected. 

The remainder of this paper is organized as follows. In Section 2, we provide an 
overview of DPPS. In Section 3 we explain the test data selection technique to detect 
interaction faults in the embedded system. In Section 4 we illustrate the application of 
our technique to DPPS, and analyze its effectiveness technique in Section 5. Our 
conclusions, summary and questions of further interest are presented in Section 6. 



2 Digital Plant Protection System 

An overview of DPPS is provided in this section DPPS is a reactor protection system, 
which is organized into four independent channels. When a problem occurs, DPPS is 
the protection system that emits a trip signal to halt the system and bring it into a safe 
state. As shown in Figure 1, DPPS consists of a bistable processor, a coincidence 
processor, and an initiation processor. 
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Fig. 1. Stmcture of DPPS 



The bistable processor receives an analog signal from the sensor to monitor the 
status of nuclear plant and a digital signal from CPC. The bistable processor compares 
the two signals entered to the set point. If either signal is higher or lower than the set 
point, it sends out the trip signal to the coincidence processor. The coincidence proc- 
essor verifies the output of the bistable processor, and sends out the trip initiation 
signal to the initiation processor when having the trip signal in more than 2 out of 4 
channels. After receiving the trip initiation signal, the trip initiation processor sends 
out the TCB signal to command to stop the system. Since DPPS is composed of 4 
identical channels. We use one channel, namely channel A as an example in this 
paper. 
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3 Test Data Selection Technique 

Figure 2 shows the proposed test data selection procedure. As shown, we analyze the 
specification for the target system to identify parts of hardware, software, and their 
interactions. Also generated is Program S to simulate the behavior of the system 
under study (also referred to as the “target system”). Having identified a hardware 
fault that can occur in the target system, transformed into an equivalent software 
fault. We generate Program P/jj by injecting the software fault into Program S. Next 
we construct test data that can differentiate between Programs S and P/j,. 




Fig. 2. Procedure of test data selection 



3.1 Analysis of System Specification 

An analysis diagram describes the target system as hardware, software with the target 
system’s behavior implemented, and the interactions between them. As shown in 
Figure 3, an analysis diagram illustrates the operational cycle of the embedded sys- 
tem. 




Fig. 3. Analysis diagram 



Upon entry into the target system, the hardware signal is transformed into the soft- 
ware input. The software component in the target system sends out output, which is 
transformed into a hardware signal that is the output of the target system. In Figure 3, 
the hardware component is drawn as a rectangle, software to be embedded in hard- 
ware is shown as a rounded-rectangle, the hardware signal as a solid line, the software 
I/O as a dotted line, and the interaction is as an ellipse. 
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3.2 Generation of Program S 

The program S that simulates the behavior of target system is described in C pro- 
gramming language, where S consists of not only the behavior of target system but 
also interaction part between hardware and software. 

3.3 Generation of Program Pf„ 

After a hardware fault is transformed into an equivalent software fault, we generate 
program P/^ by injecting the software fault into program S. To do this, first it is nec- 
essary to classify the kinds of hardware faults. As shown in Table 1, the hardware 
faults are classified as Stuck-at 0 [5,6,7], Stuck-at 1 [5,6,7], Bridging fault [5,8], Open 
fault [5], Bit-flip fault [5,7], Power surge fault [5,9,10], and Spurious current fault 
[5,6,9,10,11]. In Table 1, each classified hardware fault, description of each hardware 
fault, and the impact of the hardware fault are listed. 



Table 1. Hardware faults 



Fault Type 


Description 


Impact 


Stuck-at 0 


The result value is fixed to 0. 


The result value always comes out 
to be 0. 


Stuck-at 1 


The result value is fixed to 1. 


The result value always comes out 
to be 1 . 


Bridging 


When there are more than two 
crossing lines, the number of lines 
crossed varies. 


The number of lines crossed is 
verified. 


Open 


Resistance on either the line or the 
block occurs due to the bad con- 
nection. 


The value associated to the line or 
the block is modified to different 
value. 


Bit-flip 


The bit flips. 


The variable based on the modi- 
fied bit is verified. 


Power surge 


Inconsistent power is supplied. 


The problem not solely lies on the 
value of the specific location, but 
almost the entire system is af- 
fected. 


Spurious current 


Exposures to heavy ion. 


The problem not solely lies on the 
value of the specific location, but 
almost the entire system is af- 
fected. 



The hardware fault can be modeled as a software fault somewhere in Program S. 
Two steps are necessary to transform the hardware fault present in the embedded 
system into an equivalent software fault. The first step is to identify where in Program 
S the hardware fault affects and to determine the location where to inject an equiva- 
lent software fault. The second step is to decide how to transform the hardware fault 
injection target into software fault. 

Determination of Hardware Fault Injection Target. The part where hardware fault 
affects Program S, the software component, is defined as a variable at the very last 
location of in S; this is the location where we expect the impact of the hardware fault. 
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If there is a hardware fault, it is more likely for an unexpected value to propagate to 
the “last variable.” Here, by “last variable” of S we refer to the last location in S 
where each variable is defined. To detect the interaction faults, we propose a test data 
selection technique based on the hardware fault injection and mutation test criteria. 
Here the Fault Injection Target (FIT) becomes the last location in S to be defined for 
the variable affected by hardware fault. For example, because the location having 
impacts from the fault is known for the Open fault. Bridging fault. Bit-flip fault, 
Stuck-at 0 fault, and Stuck-at 1 fault as shown in table 1, it is possible to expect how 
the fault affects S. Therefore the variable located at the last end of S is decided as FIT. 
Since the Spurious current fault, and Power surge fault affect the entire system, the 
last output variable becomes FIT. 

Generation of Program by Transforming the Hardware Fault into Software 
Fault. In order to transform the hardware faults into software faults, the hardware 
faults identified in Table 1 are transformed into the software faults through the code 
patch. The code patch means to add the program code to modify the FIT variable 
located in the last variable in S, which is affected by hardware faults. The code patch 
method to transform the hardware faults into the software faults are following: 

[Code Patch Method] 

- If the variable of FIT has the digital signal value, negate the variable value to 
change. 

- If the variable of FIT has the analog signal value, change the variable value with 
random value. 



3.4 Selection of Test Data 

The objective of this paper is to select test data capable detecting faults that are possi- 
ble to occur due to the interaction of hardware and software. The test data T of pro- 
gram S is defined as input data, which makes the program S output different from the 
P/h output. 

T = 1 1 1 S(t) A P/Hi(t), 1< i < n , n = number of FIT } 



4 An Application Example to DPPS 

In this Section, an example of applying the proposed test data selection technique to 
the DPPS channel A is described here. 



4.1 Analysis of DPPS Specification 

Analyzing the DPPS Specification to express in Figure 3 notation, hardware consists 
of the board with Intel 80c 196 processor and I/O devices, where the software embed- 
ded to the board was implemented with features of a bistable processor, coincidence 
processor, and an initiation processor. Identifying DPPS’ hardware, software, and the 
interaction part from Figure 4 is as shown in Table 2. 
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Fig. 4. Analysis diagram of DPPS channel A 



Table 2. Hardware, Software, and Interaction Parts of DPPS channel A 



1 Hardware 


Software 


Interaction 


HI 


Input device 1 


N/A 


N/A 


H2 


Bistable processor 


Implementation of 
bistable processor 


H1^H2 


H3 


Input device 2 


N/A 


N/A 


H4 


Coincidence processor 


Implementation of 
coincidence processor 


H3^H4 


H5 


Initiation processor 


Implementation of 
initiation processor 


H5^H6 


H6 


Output device 


N/A 


N/A 



4.2 Generation of Program S 

Implement the simulation Program S in C language program for the DPPS channel 
A’s behavior of a bistable processor, coincidence processor, initiation processor, and 
the interaction part. 

In case of DPPS, change the hardware and software interaction part for the hard- 
ware signal input to software input value, which can be processed in a bistable or a 
coincidence processor. Moreover, to send out the result value processed in an initia- 
tion processor to output device, the software output is changed to hardware signal. If 
the interaction part identified in Table 2 were described in program S, it would be as 
shown in Table 3. In case of the part H1^H2, which the hardware signal entered by 
the input device is transformed into the software input value, is to be expressed in 
software, it becomes analog[i] .measure, digital[i].trip. 



Table 3. Interaction Parts in DPPS channel A’s software 



Interaction part 


Interaction parts in DPPS software 


H1^H2 


analog [i] .measure, digital [i] .trip 


H3^H4 


analog [i] . trip_bistable [B ], digital [i] . trip_bistable [B ] 


analog[i].trip_bistable[C], digital[i].trip_bistable[C] 


analog[i].trip_bistable[D], digital[i].trip_bistable[D] 


H5^H6 


analog[i] .initiation, digital[i] .initiation 
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4.3 Generation of Program Pf„ 

Determination of the Hardware Fault Injection Target. The hardware faults 
expected to occur in DPPS and the location of the FIT when these faults are 
transformed into software are as shown in Table 4. Table 4 identifies the hardware 
fault occurred in DPPS channel A according to the hardware fault classification 
taxonomy in Table 1. 

The hardware location of each fault occurrence (the assigned number in Table 2) 
and the fault injection target when these faults are injected in software are shown in 
Table 4. In case of DPPS cannel A, there were total of 76 FITs identified. 



Table 4. FITs in DPPS channel A’s software 



HAV Fault 


Hardware 


FIT in DPPS software 


Stuck-at 0 


H1,H2 


/Hi:analog[i] .measure, /H2:digital[i].trip 


H2 


/ h 3 : analog [i] ,trip_set, / h 4: digital [i] ,trip_set 
/H5:analog[i].opby_set, /H6:digital[i].opby_set 


H3, H4 


/ h 7 : analog [i] . trip_bistable [B ] ,/hs ^ digital [i] .trip_bistable[B ] 
/H 9 :analog[i].trip_bistable[C], /Hio:digital[i].trip_bistable[C], 
/Hii:analog[i].trip_bistable[D], /Hi 2 :digital[i].trip_bistable[D] 


H4 


/Hi3:analog[i].trip_coincidence, /Hi4:digital[i].trip_coincidence 


H5, H6 


/Hi5:analog[i].trip_initiation, /Hi6:digital[i].trip_initiation 


Stuck-at 1 


H1,H2 


/Hi7:analog[i].measure, /Hi8:digital[i].trip 


H2 


/Hi9:analog[i].trip_set, /H2o:digital[i].trip_set 
/H2Danalog[i].opby_set, /H22:digital[i].opby_set 


H3, H4 


/H 23 :analog[i].trip_bistable[B], /H 24 :digital[i].trip_bistable[B] 
/H 25 :analog[i].trip_bistable[C], /H 26 :digital[i].trip_bistable[C] 
/H 27 :analog[i].trip_bistable[D], /H 28 -digital[i].trip_bistable[D] 


H4 


/H29:analog[i].trip_coincidence,/H3o:digital[i].trip_coincidence 


H5, H6 


/H3i:analog[i].trip_initiation, /H32:digital[i].trip_initiation 


Bridging 


H2, H4 


/ h 33 : analog [i] .measure, / h 34: digital [i] .trip 


/H 35 :analog[i].trip_bistable[B], /H 36 :digital[i].trip_bistable[B] 


/H 37 :analog[i].trip_bistable[C], /H 38 :digital[i].trip_bistable[C] 


/H 39 :analog[i].trip_bistable[D], /H 4 o:digital[i].trip_bistable[D] 


Open 


H1,H2 


/H4i:analog[i]. measure, /H42:digital[i].trip 


H2 


/H43:analog[i].trip_set, /H44:digital[i].trip_set 
/ h 45 : analog [i] .opby_set , / h 46 : digital [i] .opby_set 


H3, H4 


/H 47 :analog[i].trip_bistable[B], /H 4 g:digital[i].trip_bistable[B] 
/H 49 :analog[i].trip_bistable[C], /H 5 o:digital[i].trip_bistable[C] 
/H 5 i:analog[i].trip_bistable[D], /H 52 :digital[i].trip_bistable[D] 


H4 


/H53:analog[i].trip_coincidence, /H54:digital[i].trip_coincidence 


H5, H6 


/H55:analog[i].trip_initiation, /H56:digital[i].trip_initiation 


Bit-flip 


H1,H2 


/ h 57 : analog [i] .measure, / h 5 s : digital [i] .trip 


H2 


/H59:analog[i].trip_set, /H6o:digital[i].trip_set 
/H6i:analog[i].opby_set, fH62:digital[i].opby_set 


H3, H4 


/H 63 :analog[i].trip_bistable[B], /H 64 :digital[i].trip_bistable[B] 
/H 65 :analog[i].trip_bistable[C], /H66:digital[i].trip_bistable[C] 
/H 67 :analog[i].trip_bistable[D], /H68:digital[i].trip_bistable[D] 


H4 


/H69:analog[i].trip_coincidence, /H7o:digital[i].trip_coincidence 


H5, H6 


/H7i:analog[i].trip_initiation, /H72:digital[i].trip_initiation 


Power Surge 


H5, H6 


/H73:analog[i].trip_initiation, /H74:digital[i].trip_initiation 


Spurious cur- 
rent 


H5, H6 


/ h 75 : analog [i] . trip_initiation, /h? 6 : digital [i] . trip_initiation 
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Generation of the Program Pfj, by Transforming the Hardware Fault into 
Software Fault. The 76 FITs identified in Table 4 are transformed into software 
faults by utilizing the code patch. Figure 5 shows the Program S and The P/jjj is 
generated by patching code to ‘analog[0].trip_set=0;’ in S. The code for Program 
S and P/jj 3 except for the part of FIT with the code patched, are the same. 



void set_point(analog_struct *analog, digital_struct *digital){... 
analog[0].trip_set= 1127; 
anal og[l].trip_set= 500; ...} 

void input(){... 

for(i=0; i<D_MAX; i++){ ... 
forG=l; j<CH_MAX;j-H-){ 
clear_screenO; 

1 c dMov e Cursor(0, 0) ; 

IcdPutStringC'Enter the”); 

}} 

digital[0].tTip = 0; /* Interaction Fault*/ 

} 



voidset_point(analog_struct *analog, digital_struct ’•‘digital){ 
analog[0].trip_set = 1127; 
analog[0].trip_set = 0; /* Stuck at O*/ 
analog[l].tnp_set =500; ...} 

voidinput0{-- 
for(i=0, i<D_MA3C; i++){ . . , 
forG=l; j<CH_MAX,j++) { 
clear_screen0, 

Ic dMove Cur sor (0 , 0) ; 

IcdPutStringC'Enter the"); 

.) ) 

digital[0].trip = 0; /* Interaction Fault*/ 

> 



s 









Fig. 5.SandP/„3 



4.4 Selection of DPPS Test Data 

DPPS channel A in Program S consists of 5 input variables including 3 analog vari- 
ables and 2 digital variables, and output of it is a trip signal. The test data T is selected 
as the data of input variable, which makes difference in the outputs of Program S and 
P/„. In Table 5, let t be one of the test data in T, then the expected output of t be- 
comes 0(t), the t applied to S is S(t), and the result of t applied to P/^j isP/ns (O- In 
Table 5, NT denotes that there occurs No Trip, and T implies the occurrence of Trip. 



Table 5. 0(t), S(t), P/h 3 (t) for test data t 



Input variable 


Analog_0 


Analog_l 


Analog_2 


Digital_0 


Digital_l 


t 


0 


600 


200 


1 


0 


Output variable 


Trip_A0 


Trip_Al 


Trip_A2 


Trip_D0 


Trip_Dl 


S(t) 


NT 


T 


NT 


NT 


NT 


P/h 3 (t) 


T 


T 


NT 


NT 


NT 


0(t) 


NT 


T 


NT 


T 


NT 



For example, when the test data t = (0, 600, 200, 1, 0) is applied to S and P/„j, 
there are different outputs as shown below. 

- S(t) = (NT, T, NT, NT, NT), where trip occurred only in Analog_l. 

- P /^3 (t) = (T, T, NT, NT, NT), where trips occurred in Analog_0, and Analog_l. 

Therefore t is the input data capable of differentiating the S output from the P/jj 3 
output, which is one of the test data selectable by our proposed technique. As men- 
tioned in Table 5, the expected output 0(t) is (NT, T, NT, T, NT), which is different 
from S(t) (NT, T, NT, NT, NT). The meaning of t is that it can be the data to detect 
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the interaction faults between hardware and software in DPPS channel A. In case of 
the correct program, there should not be any ‘digital[0].trip=0;’ but there is the hard- 
ware and software interaction fault, ‘digital[0].trip=0;’ in the bold letter part of Pro- 
gram S in Figure 5. For ‘digital[0].trip=0;’ the hardware signal read from the input 
device needs to be transformed into software signal, and because it is also the part 
where there is interaction between hardware and software, it is possible for the hard- 
ware and software interaction fault occurrence in DPPS. Therefore it is recognizable 
that the selected test data t is the test data capable of detecting the interaction fault, 
‘digital[0].trip=0;’. 

Therefore the goal of our proposed technique is to select test data capable of de- 
tecting the interaction faults between hardware, and the predominance of our tech- 
nique is described in Section 5 based on the result of the experiment. 



5 Empirical Studies and Analysis 

5.1 Environments and Procedures of Empirical Studies 

Environments. The DPPS channel A example mentioned in Section 4 is simulated by 
4 different testers, where the test is executed by embedding Program SI, S2, S3, and 
S4 written in C programming language into the hardware board that uses an Intel 
80c 196 processor. 

The environment in Figure 6 includes the input entered by the user being sent to 
the board through the PC and board input cables, and the results of board calculation 
being displayed to the user as output. Here the power for the board is supplied by the 
PC power taken from the power cable. 



Cable for power supply 




Fig. 6. Experimental Environments. 



Procedure. Let S be the C language program for DPPS channel A without faults. 
Program Si has differnet interaction fault as shown in Table 6, and the procedure for 
each Expi (l<i<4)is as following: 

Stepl. Collect test data for the Program Si by applying white-box test technique. 
Generate the test data set Tg, which satisfies the criteria for all-node, decision, 
c-use, p-use, and all-use in this experiment by using Automatic Test Analysis 
forC(ATAC)[12,13]. 







Interaction Testing in an Embedded System 201 



Step2. Generate the Program P/jj by inserting hardware fault, which is transformed 
into software fault for the identified FIT in the Program Si shown in Table 2. 

Step3. Execute the startup function related to hardware configuration to operate the 
board. 

Step4. Add the master code to the Program S, the Program Si, and the Program P/^. 

StepS. Compile the code with the Cross Compiler. 

Step6. Mount the hex formatted code on the target board. 

Step?. Select the test data set T as the set Tg collected from Stepl, which differentiate 
the Program Si and the Program P/„. 



5.2 Analysis 

To prove the predominance of our proposed technique, we compared the number of 
test data and the fault detection rate to cover 100% of our technique. As shown in 
Table 6, the data for Expl, Exp2, Exp3, and Exp4 are indicated with interaction faults 
present in Si, the number of test data, the number of test data with finding fault, and 
the fault detection rate for the random selection, existing technique and the proposed 
technique. In Table 6, ‘Random’ means select test data Randomly, ‘Existing’ means 
existing technique is selecting test data by ATAC, and ‘Proposed’ means the Pro- 
posed technique. TD means the number of Test Data consumed, ETD is the number of 
Test Data with Eault detected, and ATD indicates the number of Average Test Data. 



Table 6. Experimental data 



Expi 


Interaction faults in Si 


Item 


Experimental Data (Fault Detection Rate) | 


Random 


Existing 


Proposed | 


Expl 


Fault: Line # 591 
analog! 1 1 .trip_initiation= 1 ; 
There should not be a 
correct code. 


TD 


32 


22 


2 


3 


4 


5 


6 


13 


50 


89 


26 


2 


FTD 


6 


6 


12 


32 


72 


18 


0 


ATD 


32 (0.19) 


22 (0.27) 


3.74 (0.75) 1 


Exp2 


Fault: Line # 424 
analog[0] .measure=0; 
There should not be a 
correct code. 


TD 


32 


20 


2 


3 


4 


5 


6 


9 


54 


97 


32 


0 


FTD 


5 


5 


9 


34 


78 


18 


0 


ATD 


32 (0.16) 


20 (0.25) 


3.79 (0.72) 1 


Exp3 


Fault: Line # 456 
digital[0].trip_bistable[l]=l; 
There should not be a 
correct code. 


TD 


32 


23 


2 


3 


4 


5 


6 


5 


49 


105 


27 


0 


FTD 


6 


6 


5 


38 


92 


22 


0 


ATD 


32 (0.19) 


23 (0.26) 


4.01 (0.84) 1 


Exp4 


Fault: Line # 443 
digital[0].trip=0; 
There should not be a 
correct code. 


TD 


32 


20 


2 


3 


4 


5 


6 


0 


33 


58 


26 


0 


FTD 


7 


7 


0 


15 


43 


22 


0 


ATD 


32 (0.22) 


20 (0.35) 


3.94 (0.68) 1 



Comparison of Number of Test Data. The randomly selected number of test data, 
the number of test data for our proposed technique is shown in Table 6 and Eigure 7 
bar graph. The randomly selected number of test data, the number of test data based 
on the existing technique, and the number of test data for our proposed technique are 
shown in Table 6 and Eigure 7 bar graph. 

As shown in Table 6 Expl, the random number of test data is 32, the number of 
test data Tg to cover 100% of the existing technique is 22, and the data which differ- 
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entiate SI and all the P/j, generated from SI that is to cover 100% of our proposed 
technique can be analyzed in detail as follows: in case of covering 100% of our pro- 
posed technique with 2 test data involves 13 kinds, with 3 test data involves 50 kinds, 
with 4 test data involves 89 kinds, with 5 test data involves 26 kinds, and with 6 test 
data involves 2 kinds, to have the average number of test data covering 100% of our 
proposed technique as 3.79, which requires relatively small number of test data com- 
pared to the random and existing test techniques. Like Expl, the average number of 
test data for Exp2, Exp3, and Exp4 are 3.79, 4.01, and 3.94 respectively, which indi- 
cates very low for our proposed technique compared to the random and existing test 
data selection techniques. 




Fig. 7. Average number of test data 



Comparison of Fault Detection Rate. The fault detection rate was measured with 
targeting SI, S2, S3, and S4 that were generated by artificially inserting the 
interaction fault into Program S. The fault detection rate is shown in the small 
parenthesis of Table 6 and in the bar graph of Figure 8. The fault detection rate is 
defined as following: 

Fault Detection Rate = Number of Test Data Found with Faults / 

Total Number of Test Data Generated 




□Random 
□ Existing 
□Proposed 



Fig. 8. Fault Detection Rate 
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In case of Expl, the random test data selection technique had 6 out of 32 test data 
detected faults, and therefore the fault detection rate is 0.19. The existing technique 
had 6 out of 22 test data detected faults to lead 0.27 for the fault detection rate. Our 
proposed technique on the other hand had 135 out of total of 180 generated test data 
founded faults to derive 0.75 fault detection rate. Like Expl, the fault detection rate 
for Exp2, Exp3, and Exp4 indicated as 0.72, 0.84, and 0.68 respectively, which is 
illustrated by experiment in the bar graph of Fig 8, to realize the higher fault detection 
rate of our proposed technique compared to the random and existing techniques. 



6 Summary and Future Research 

Increasing complexity of embedded system has raised the need for extensive testing 
of embedded systems. Testing an embedded system is expensive. In addition making 
any modification to an embedded system for the purpose of testing, such as required 
by program mutation, is difficult. This is especially true of the system used in the 
DPPS example mentioned in this paper. 

In this paper, we proposed a test data selection technique to detect hardware and 
software interaction faults in an embedded system by utilizing the hardware fault 
injection and program mutation and applied our proposed technique to DPPS channel 
A. In order to select the test data capable of detecting the DPPS interaction faults, we 
analyzed the different parts of DPPS including hardware, software, and the interaction 
between them, and also generated the Program S in C programming language to simu- 
late the behaviors of target system. After extracting the hardware faults from the 
analyzed hardware component, the location for FIT was determined, where the ex- 
tracted faults were to be inserted in program S. From the code patch, hardware faults 
were inserted into program S by transforming it into software faults, and the programs 
P/jjWere generated. Next, the input data capable of differentiating the output of Pro- 
grams S and P/^ were constructed to detect the faults caused by the interaction of 
hardware and software. 

“Good” test data means is one that has a high fault detection rate and is small in 
size. We conducted a case study to investigate the effectiveness of test data generated 
by the proposed technique. In this study, we used an Intel 80cl96 prototyping board. 
The results from this study indicate that proposed has a high fault detection rate with 
the small number of test data. 

In the future, we plan to implement other proposed techniques into a tool, and thus 
automate the generation of test data to detect faults caused due to hardware-software 
interactions in an embedded. In addition, instead of limiting the application to DPPS 
for the case study as in this paper, we plan to apply the proposed tool to various em- 
bedded systems and conduct experiments to investigate the performance of our tech- 
nique. 
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Abstract. We adapt and extend the theories used in the general framework of 
automated software testing in such a way that they become suitable for black-box 
conformance testing of thin client Internet applications. That is, we automatically 
test whether a running Internet application conforms to its formal specification. 
The actual implementation of the application is not taken into account, only its 
externally observable behaviour. In this paper, we show how to formally model 
this behaviour and how such formal specifications can serve as a basis for the 
automatic conformance testing of Internet applications. 



1 Introduction 

Since activity on the Internet is growing very fast, systems that are based on communi- 
cation via the Internet appear more and more. To give an example, in the United States 
only, 45 billion dollar of products has been sold via the Internet in 2002 [1]. This is 
an increase of 38% compared to the on-line sales in 2001. Apart from the number of 
so-called Internet applications, the complexity of these applications increases too. This 
increasing complexity leads to a growing amount of errors in Internet applications, of 
which examples can be found at The Risks Digest [2], amongst others. This increasing 
number of errors asks for better testing of the applications and, preferably, this testing 
should be automated. 

Research has been done in the field of automated testing of applications that are not 
based on Internet communication [3]. In this paper, we adapt and extend the theories 
used in the general framework of automated software testing in such a way that they 
become suitable for the testing of Internet applications. 

We focus on black-box conformance testing of thin client Internet applications. That 
is, given a running application and a (formal) specihcation, our goal is to automatically 
test whether the implementation of the application conforms to the specihcation. Black 
box testing means that the actual implementation of the application is not taken into 
account but only its externally observable behaviour: We test what the application does, 
not how it is done. Interaction with the application takes place using the interface that 
is available to normal users of the application. In this case, the interface is based on 
communication via the Internet using the HTTP protocol [4]. 

A. Petrenko and A. Ulrich (Eds.): FATES 2003, LNCS 2931, pp. 205-222, 2004. 

@ Springer- Verlag Berlin Heidelberg 2004 
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As a start, in Section 2 the distinction between Web based and Window based ap- 
plications is drawn. Next, in Section 3 we introduce how we plan to automatically test 
Internet applications. In Section 4, we describe the formalism we make use of for this 
automatically testing. To show the usefulness of the framework, we give a practical 
example in Section 5. We discuss related work in Section 6 and draw some hnal con- 
clusions in Section 7. 



2 Web Based versus Window Based Applications 

In general, Web based applications, or Internet applications, behave like window based 
applications. They both communicate via a user interface with one or more clients. 
However, there are some major differences. 

The Internet applications we focus on, are based on client-server communication 
via the Internet. The application runs on a server which is connected to the Internet. 
Via this connection, clients who are also connected to the Internet can interact with 
the application using prescribed protocols. Clients send requests over the Internet to 
the server on which the application runs. The server receives the requests and returns 
calculated responses. 



Wob based interaction 



Window based interaction 



Clients 




HTTP request 










Third 

Parties 

HTTP response 




Server 


“1 

|Appli- 

jcation 











Client 



GUI 



GUI 



Appli- 

cation 



Fig. 1. Internet interaction versus stand-alone interaction. 



In Figure 1 a schematic overview of the communication with Internet applications 
and window based applications is given. Clients interacting with window based appli- 
cations are using a (graphical) user interface which is directly connected to the appli- 
cation. When interacting with Internet applications, the client sends an HTTP request 
[4] via the Internet, i.e. via some third parties, to the server. The server receives the 
request which subsequently is sent to the application. After receiving the request, the 
application calculates a response which is sent back to the requesting client. As can 
be seen in Figure 1 , when testing an Internet application we have to take into account 
hve entities, viz. clients, communication protocols, third parties, web servers and the 
application itself. 

Clients. The clients we focus on are so-called thin clients. This means that they have 
reduced or no possibility to do calculations. They make use of a centralised re- 
source to operate. In the context of Internet applications, thin clients are usually 
web browsers. In general, more than one client can simultaneously access an In- 
ternet application. Unlike stand-alone applications, clients can fail, i.e. they can 
“disappear”: a browser can simply be closed without notifying the application. 
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Dependency on Third Parties. Since interaction takes place via the Internet, commu- 
nication depends on third parties. First of all, packages transmitted go via routers 
which control the Internet traffic. It is not known which route on the world wide 
weh is taken to get from the client to the server and hack. 

Apart from transmitting the requests and responses, there are more dependencies, 
like DNS servers for translating domain names into IP addresses, trusted third par- 
ties for verifying certificates and e-mail servers for both the sending and receiving 
of e-mail messages. 

Stand-alone applications usually do not depend on any of these parties. 
Communication via the Internet. Most of the communication with Internet applica- 
tions we focus on is based on the HyperText Transfer Protocol (HTTP) [4] . This 
protocol is request-response based. A web server is waiting for requests from 
clients. As soon as a request comes in, the request is processed by an application 
running on the server. It produces a response which is sent back. Since the commu- 
nication takes place via the Internet, delay times are unknown and communication 
can fail. Therefore, messages can overtake other messages. 

Web Servers. A web server is a piece of hardware connected to the Internet. In contrast 
to stand-alone machines running a stand-alone application, a client might try to 
access a web server which is down or overtaxed, causing the interaction to fail. 
Internet Applications. The Internet application itself is running on a web server. The 
applications we focus on, are based on request-response interaction with multiple 
clients. Since more than one client can interact with the application simultaneously, 
there might be a notion of who is communicating with the application. By keep- 
ing track of the interacting parties, requests and corresponding responses can be 
grouped into so-called sessions. 

Main differences between Internet based and window based applications are the fail- 
ing of clients and web servers, the failing of communication and overtaking of messages 
between clients and the application and the dependency on third parties. Furthermore, 
Internet applications are request-response based where window based applications in- 
teract with the clients using a (graphical) user interface. Finally, most Internet applica- 
tions focus on parallel communication with more than one client. Since multiple clients 
can share a common state space, testing Internet applications is basically different from 
testing window based applications. Window based applications are mostly based on sin- 
gle user interaction. More differences between Web based and Window based systems 
can be found in e.g. [5]. 



3 Testing Internet Applications 

Now that we have a notion of what Internet applications look like, we informally show 
how implementations of these applications can be tested. 

We focus on black-box testing, restricting ourselves to dynamic testing. This means 
that the testing consists of really executing the implemented system. We do this by 
simulating real-life interaction with the applications, i.e. by simulating the clients that 
interact with the application. The simulated clients interact in a similar way as real- 
life clients would do. In this way, the application cannot distinguish between a real-life 
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Fig. 2. Automatic testing of Internet applications. 



client and a simulated one. See Figure 2 for a schematic overview of the test environ- 
ment. 

We make use of a tester which generates requests and receives responses. This is 
called test execution. By observing the responses, the tester can determine whether they 
are expected responses in the specification. If so, the implementation passes the test, if 
not, it fails. 

The tester itself consists of four components, based on [6] : 

Specification. The specification is the formal description of how the application under 
test is expected to behave. 

Primer. The primer determines the requests to send by inspecting the specification and 
the current state the test is in. So the primer interacts with the specification and 
keeps track of the test’s state. Furthermore, the primer checks whether responses 
received by the tester are expected responses in the specification at the state the test 
is in. 

Driver. The driver is the central unit, controlling the execution of the tests. This com- 
ponent determines what actions to execute. Furthermore, the verdict whether the 
application passes the test is also computed by the driver. 

Adapter. The adapter is used for encoding abstract representations of requests into 
HTTP requests and for decoding HTTP responses into abstract representations of 
these responses. 

While executing a test, the driver determines if a request is sent or a response is 
checked. If the choice is made to send a request, the driver asks the primer for a correct 
request, based on the specification. The request is encoded using the adapter and sent 
to the application under test. If the driver determines to check a response, a response is 
decoded by the adapter. Next, the primer is asked whether the response is expected in 
the specification. Depending on the results, a verdict can be given on the conformance 
of the implementation to its specification. 

As mentioned in Section 2, clients, web servers, their mutual communication and 
third parties can fail. In such a case, no verdict can be given on the correctness of the 
implementation of the Internet application. However, depending on the failure, it might 
be possible to determine the failing entity. 
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4 Conformance Testing of Internet Applications 

As a basis for conformance testing of Internet applications, we take the formal frame- 
work as introduced in [7-9]. Given a specification, the goal is to check, by means of 
testing, whether an implemented system satisfies its specification. To be able to for- 
mally test applications, there is a need for implementations and formal specifications. 
Then, conformance can be expressed as a relation on these two sets. 

Implementations under test are real objects which are treated as black boxes exhibit- 
ing behaviour and interacting with their environment. They are not amenable to formal 
reasoning, which makes it harder to formally specify the conformance relation. There- 
fore, we make the assumption that any implementation can be modelled by a formal 
object. This assumption is referred to as the test hypothesis [10] and allows us to handle 
implementations as formal objects. We can express conformance by a formal relation 
between a model of an implementation and a specification, a so-called implementation 
relation. 

An implementation is tested by performing experiments on it and observing its re- 
actions to these experiments. The specification of such an experiment is called a test 
case, a set of test cases a test suite. Applying a test to an implementation is called test 
execution and results in a verdict. If the implementation passes or fails the test case, the 
verdict will be pass or fail, respectively. If no verdict can be given, the verdict will be 
inconclusive. 

In the remainder of this section, we will instantiate the ingredients of the framework 
as sketched above. We give a formalism for both modelling implementations of Internet 
applications and for giving formal specifications used for test generation. Furthermore, 
we give an implementation relation. By doing this, we are able to test whether a (model 
of an) implementation conforms to its specification. Apart from that, we give an algo- 
rithm for generating test suites from specifications of Internet applications. 

4.1 Modelling Internet Applications 

To be able to formally test Internet applications, we need to formally model their be- 
haviour. Since we focus on conformance testing, we are mainly interested in the com- 
munication between the application and its users. We do not focus on the representation 
of data. Furthermore, we focus on black-box testing, which means that the internal state 
of applications is not known in the model. Finally, we focus on thin client Internet ap- 
plications that communicate using the HyperText Transfer Protocol (HTTP) [4] . As a 
result, the applications show a request/response behaviour. 

These observations lead to modelling Internet applications using labelled transition 
systems. Each transition in the model represents a communication action between the 
application and a client. The precise model is dictated by the interacting behaviour of 
the HTTP protocol. 

In general, an HTTP interaction is initiated by a client, sending a request for some 
information to an application. A request can be extended with parameters. These pa- 
rameters can be used by the application. After calculating a response, it is sent back 
to the requesting client. Normally, successive requests are not grouped. However, the 
grouping can be done by adding parameters to the requests and responses. In such a 
way, alternating sequences of requests and responses are turned into sessions. 
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Note that we test the interaction behaviour of Internet applications communicating 
via HTTP. We do not model the client-side tools to interact with Internet applications, 
i.e., we do not model the behaviour of the application when using browser buttons like 
stop, back, forward and refresh. Main reason for not including this behaviour is that 
different client implementations cause distinct interaction behaviour. 

Furthermore, we do not add (failure of) components in the system under test other 
than the application to the specification. This means that failure of any of these com- 
ponents leads to tests in which the result will be inconclusive. If all components in the 
system under test operate without failure, verdicts will be pass or fail. 

The tester should behave like a set of thin clients. The only requests sent to the 
application are the initial request which models the typing in of a URL in the browser’s 
address bar and requests that result from clicking on links or submitting forms which 
are contained in preceding responses. 

Since we focus on HTTP based Internet applications, and thus on sessions of al- 
ternating request-response communication with applications, we make use of so-called 
multi request-response transition systems (MRRTSs) for both modelling implementa- 
tions of Internet applications and giving formal specifications used for test generation. 
An MRRTS is a labelled transition system having extra structure. In the remainder of 
this section we explain MRRTSs in more detail and show how they relate to labelled 
transition systems and request-response transition systems (RRTSs). 



Labelled Transition Systems. The formalism of labelled transition systems is widely 
used for describing the behaviour of processes. We will provide the relevant definitions. 

Definition 1. A labelled transition system is a 4-tuple {S, L, sq) where 

- S is a countable, non-empty set o/ states; 

- L is a countable set o/ labels; 

- S X L X S is the transition relation; 

- So € S' is the initial state. 

Definition 2. Let Si (i £ be states and ai (i G N) be labels. A (finite) composition 
of transitions 

ai a2 o-n 

Si > S2 > . . . S„ > S„+i 

is then called a computation. The sequence of actions of a computation, ai • 02 • . . . • a„, 
is called a trace. The empty trace is denoted by e. If L is a set of labels, the set of all 
finite traces over L is denoted by L*. 

Definition 3. Let p = (S, L, sq), s, s' G S, S' C S, Oi G L and a G L*. Then, 
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A labelled transition system p = {S, L, sq) will be identified by its initial state 
Sq. So, e.g., we can write traces(p) instead of traces(so) and p after a instead of 
Sq after cr. 

We aim at modelling the behaviour of the HTTP protocol using labelled transition 
systems. Therefore, we need to add restrictions on the traces in the labelled transition 
system used for modelling this behaviour. One of these restrictions is that traces in the 
LTSs should answer the alternating request/response behaviour. 

Definition 4. Let A, B be sets of labels. Then alt(A, B) is the (infinite) set of traces 
having alternating structure with respect to elements in A and B, starting with an 
element in A. Formally, alt (A, B) is the smallest set such that 

s € alt(A, B) A \/a G elt(B, A)\/a G A aa G alt(A, B) . 

As mentioned before, interactions with an Internet application can be grouped into ses- 
sions. To be able to specify the behaviour within each session, we make use of a pro- 
jection function. This function will be used for determining all interactions contained 
within one session. 

Definition 5. Let a be a trace and Abe a set of labels. Then a \ a, the projection of a 
to A, is defined by 



^ I A — def ^ 



(u • (j) 1^ — def 



a ■ (a \a) A a G A 
u\a if a ^ a . 



Definition 6. A partitioning S of a set A is a collection of mutually disjoint subsets of 
A such that their union exactly equals A: 



= A A yB,CGSB^C^BFC = % 



Reqnest-Response Transition Systems. We give a formal definition of a request- 
response transition system, denoted by RRTS. RRTSs can be compared to input-output 
transitions systems (lOTSs) [11]. As in lOTSs, we differentiate between two sets of 
labels, called request labels and response labels, respectively. RRTSs are based on pure 
request/response alternation. 

Definition 7. Let L be a countable set of labels and {T?, L\} be a partitioning of L. 
Then, a request-response transition system (S', L?, Ti, sq) is a labelled transition 
system (S, L, sf) such that 

Vcr G traces(so) cr G alt(L?, Li) . 

Elements in L? are called request labels, elements in L\ response labels. 

RRTSs resemble the notion of Mealy machines, however, it turns out to be techni- 
cally adhered to start from the notion of RRTSs. 
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Multi Request-Response Transition Systems. lOTSs can be used as a basis for multi 
input-output transition systems (MIOTSs) [12]. Similarly, in a multi request-response 
transition system (MRRTS), multiple request-response transition systems are com- 
bined into one. All subsystems behave like an RRTS, however interleaving between 
the subsystems is possible. 

Definition 8. Let L be a countable set of labels. Let L C V{L) x V{L) be a countable 
set of tuples such that {A^B \ (A, B) G L} is a partitioning ofL. Then, a multi request- 
response transition system (S', L, sq) is a labelled transition system (S, L, sq) 
such that 

V(A, R) G L Vcr G traces(so) cr \avjb € alt(A, B) . 

The set of all possible request labels, L?, is defined by 

L? =det [J A . 

(A,B)eL 

The set of all possible response labels, Li, is defined by 

Li =def [J B . 

(A,B)eL 

Note that an RRTS (S, L?, Li, sq) can be interpreted as MRRTS (S, {(L?, Li)}, 
So), i.e., each MRRTS having singleton L is an RRTS. 

We introduce some extra functions on the sets of tuples as introduced in Definition 8. 

Definition 9. Let L C V{L) x V{L) be a countable set of tuples, where each tuple 
contains a set of request labels and a set of response labels. We define functions for de- 
termining corresponding requests or responses given either a request label or response 
label. For x € L, we define functions req, resp : L — > V{L), such that 

(req(x), resp(a;)) G L and x G req(x) U resp(x) . 

4.2 Relating Multi Request-Response Transition Systems 

An implementation conforms to a specification if an implementation relation exists be- 
tween the model of the implementation and its specification. We model both the imple- 
mentation and the specification as multi request-response transition systems, so confor- 
mance can be defined by a relation on MRRTSs. 

While testing Internet applications, we examine the responses sent by the applica- 
tion and check whether they are expected responses by looking at the specification. So 
we focus on testing whether the implementation does what it is expected to do, not what 
it is not allowed to do. 

Given a specification, we make use of function exp to determine the set of expected 
responses in a state in the specification. 




Automatic Conformance Testing of Internet Applications 



213 



Definition 10. Let p be a multi request-response transition system {S, L, sq). For 
each state s G S and for each set of states S' C S', the set of expected responses in s 
and S' is defined as 



exp(s) =def init(s) flLi 
exp(S') =def IJs' G S' exp(s') . 

If a model of an implementation i conforms to a specification s, the possible responses 
in all reachable states in i should be contained in the set of possible responses in the cor- 
responding states in s. Corresponding states are determined by executing corresponding 
traces in both i and s. 

Definition 11. Let MRRTS i be the model of an implementation and MRRTS s be a 
specification. Then i conforms to s with respect to request-response behaviour, 
i rrconf s, if and only if all responses of i are expected responses in s: 

i rrconf s =def Vcr G traces(s) exp(f after a) C exp(s after a) . 

Relation rrconf on MRRTSs is analogous to relation conf on LTSs as formalised 
in [13], 



4.3 Test Derivation 

An implementation is tested by performing experiments on it and observing its reactions 
to these experiments. The specification of such an experiment is called a test case. Ap- 
plying a test to an implementation is called test execution. By now we have all elements 
for deriving such test cases. 

Since the specification is modelled by an MRRTS, a test case consists of request 
and response actions as well. However, we have some more restrictions on test cases. 
First of all, test cases should have finite behaviour to guarantee that tests terminate. 
Apart from that, unnecessary nondeterminism should be avoided, i.e., within one test 
case the choice between multiple requests or between requests and responses should be 
left out. 

In this way, a test case is a labelled transitions system where each state is either a 
terminating state, a state in which a request is sent to the implementation under test, or 
a state in which a response is received from the implementation. The terminating states 
are labelled with a verdict which is a pass or fail. 

Definition 12. A test case t is an LTS {S, L? U Li , sq) such that 

— t is deterministic and has finite behaviour; 

— S contains terminal states pass and fail with init(pass) = init(fail) = 0,' 

— for all s G S \ {pass, fail}, init(s) = {of for a G L? or init(s) = Li. 

We denote this subset of LTSs by TESTS. A set of test cases T C TESTS is called a 
test suite. 
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We do not include the possibility for reaching inconclusive states in test cases. Such 
verdicts are given if a component in the system under test, other than the application, 
fails. The tester (as described in Section 3) is able to identify errors caused by the 
application and lead to a fail state. Other errors result in an inconclusive verdict. 

As mentioned, we call a set of test cases a test suite. Such a test suite is used for 
determining whether an implementation conforms to a specihcation. A test suite T is 
said to be sound if and only if all implementations that conform to the specification pass 
all test cases in T. If all implementations that do not conform to the specification fail a 
test case in T, T is called exhaustive. Test suite that are both sound and exhaustive are 
said to be complete [9]. 

Definition 13. Let MRRTS i be an implementation and T be a test suite. Then, imple- 
mentation i passes test suite T if no traces in i lead to a fail state: 

i passes T =det G T 3cr G traces(z) cr • fail G traces(f) 

We use the notation cr • fail to represent trace a leading to a fail state, i.e., a ■ fail G 
traces(f( — def f ^ fail. 

Definition 14. Let s be a specification and T be a test suite. Then for relation rrconf .• 



T is sound 


=def 


Vi 


i rrconf s = 


=4> i passes T 


T is exhaustive 


—def 


Vi 


i rrconf s 4= 


= i passes T 


T is complete 


—def 


Vi 


i rrconf s 4= 


=4> i passes T 



In practice, however, such a complete test suite will often be infinitely large, and 
therefore not suitable. So, we have to restrict ourselves to test suites for detecting non- 
conformance instead of test suites for giving a verdict on the conformance of the imple- 
mentation. Such test suites are called sound. 

To test conformance with respect to request-response behaviour, we have to check 
for all possible traces in the specification that the responses generated by the implemen- 
tation are expected responses in the specification. This can be done by having the imple- 
mentation execute traces from the specification. The responses of the implementation 
are observed and compared with the responses expected in the specification. Expected 
responses pass the test, unexpected responses fail the test. The algorithm given is based 
on the algorithm for generating test suites as defined in [14]. 

Algorithm 1. Let s be MRRTS (S', L, sq). Let C be a non-empty set containing all 
possible states of the specification in which the implementation can be at the current 
stage of the test. Initially C = {sq}. We then define the collection of nondeterministic 
recursive algorithms gentest" (n G N) for deriving test cases as follows: 

gentest" : 'P(S) ^ TESTS 

gentest”(C) =det [ return pass 

[]n> 0 AaGL? A C after a 7^ 0 ^ 

return a ■ gentest””^ (C after a) 

[] n > 0 — > 

return '^{b ■ fail | 5 G Li \ exp(C)} 

-I- ^{6 • gentest””^ (C after 6) | 6 G exp(C')} 




Automatic Conformance Testing of Internet Applications 



215 



The • infix notation is used for sequential composition. So, e.g., a • b relates to 
transitions s s' s" . As mentioned, notation a ■ pass and a ■ fail is used for 
representing transitions s pass and s fail, respectively. We use Z'-notation to 
indicate that it is not known which of the responses is returned by the implementation. 

So, e.g. a + b relates to transitions s s' and s s". Depending on whether the 
response is expected, the algorithm might either continue or terminate in a fail state. 

Although a choice for the first option can be made in each step, we added a param- 
eter to the algorithm, n G N, to force termination. As mentioned, we want all test cases 
to be finite, since otherwise no verdict might take place. 

The set of derivable test cases from gentest”(C) is denoted by gentest"(C). So 
gentest" (C) is the set of all possible test cases of at most n transitions starting in states 
C of the specification. Although our goal is to generate sound test suites, we will prove 
that in the limit, as n reaches infinity, test suite lJn>o g6ntest"({so}) is complete for 
specification {S, L, sq). To prove this, we make use of some lemmas. 

Lemma 1. Let s be a specification {S, L, sq) and ctq, cti G L*, a\ ^ e. Then 

(ToCTi G traces(gentest”({so})) <1=^ a\ G traces(gentest"“l'^°l (so after cto)) 

where |cr| is the length of trace a. 

Sketch of proof. This lemma can be proved by using induction on the structure of cto- 

Lemma 2. Let s be a specification {S, L, sq), co G L* and n > 0. Then 

a ■ fail G traces(gentest"({so})) =1^ 3(j' G L*3b G Li <t = a'b . 

Proof This can be easily seen by looking at the definition of the gentest algorithm: 
State fail can only be reached after execution of a 6 G L| . 

□ 

Theorem 1. Let s be a specification {S, L, sq). 

Then test suite lJn>o g6atest"({so}) is complete. 

Proof. Let s be (S', L, sq) and T be Un>o g®atest"({so})- Then, 

T is complete 

= { definition of complete test suites } 

Vz i rrconf s ^ i passes T 
= { definition of rrconf and passes } 

Vz Vcr G traces(s) exp(z after a) C exp(s after a) 

G T 3cr G traces(z) cr • fail G traces(f) 

We prove this by proving exhaustiveness (<J=) and soundness (=J>) separately. 

- Exhaustiveness. 

Vz Vcr G traces(s) exp(z after a) C exp(s after cr) 

G T 3cr G traces(z) cr • fail G traces(f) 
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We prove exhaustiveness by contradiction: 

Let a G traces(s) and b G exp(i after cr) such that b ^ exp(s after cr). Then, we 
prove that 3t G T 3a' G traces(i) a' ■ fail G traces(t). 

3t G T 3a' G traces(i) a' ■ fail G traces(t) 

<= { b G exp(z after a) ^ ab G traces(i), Let a' = a - b} 

3t GT a - b ■ fail G traces(t) 

<J= { Definition of T } 

3n > 0 cr • 6 • fail G traces(gentest"({so})) 

= { Lemma 1 } 

3n > 0 6 • fail G traces(gentest”“l'^l ({so after cr})) 

J gentest (third option), let n > |cr|, 

} b ^ exp(so after cr) 6 G Li \ exp(so after a) 
true 

- Soundness. 



Vz Vcr G traces(s) exp(z after a) C exp(s after a) 

^3t GT3a G traces(z) a ■ fail G traces(t) 

Soundness is also proved by contradiction: 

Let t G T and a G traces(z) such that a ■ fail G traces(t). Then, by definition 
of r, 3n > 0 cr • fail G traces(gentest”({so}))- Let m > 0 such that a ■ fail G 
traces(gentest™({so}))- We prove that 3cr' G traces(s)36 G exp(z after a') b ^ 
exp(s after a'). 

Let a" G traces(s) and b" G exp(z after a"). Then, we prove that b" ^ exp(s 
after a"). Since a ■ fail G traces(gentest’”({so})), using Lemma 2, 3cr' G 
traces(s)36 G Li cr = cr' • 6. Let a = a" • b". Then, 

cr • fail G traces(gentest’”({so})) 

= {a = a"- b"} 

a" ■ b" ■ fail G traces(gentest’”({so})) 

= { Lemma 1 } 

b" ■ fail G traces(gentest"'“l"'”l(so after a")) 

^ { Definition of algorithm gentest (third option) } 
b" G Li \ exp(so after a") 

^ { Set theory } 

b" ^ exp(so after a") 

□ 

Algorithm 1 can easily be optimised. As can be seen by looking at the algorithm, 
each choice for inspecting a response of the implementation leads to |Li | new branches 
in the generated test case. However, many of these branches will never take place as a 
result of the alternating request-response behaviour of the implementation: the imple- 
mentation can only send responses on requests sent by the tester. It can be proved that 
by adding only this restricted set of responses to the test cases, such optimised generated 
test suites are still complete. 
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5 Example: Internet Vote 

We show how the theory introduced in former sections can be used for testing real-life 
Internet applications. As an example, we take a voting protocol. All members of a group 
of voters are asked whether they are for or against a proposition. They are able to visit 
a web site where they can either vote or have a look at the current score. They can vote 
at most once and they can check the score as often as they want to. 

We start by giving an MRRTS that formally specifies the application. Let V be the 
set of voters and P = {for, against}. Then, 

- L, the set of tuples of transition labels is dehned as 

L = { ( {vote(u,p)?s I p G Pj, {oki^.^okis}) | u G V, s G N } 

U { ( {score?*}, {score(/, a)i*} ) | /, a, s G N } . 

The first part specifies the interactions where voter v sends a request to vote for or 
against the proposition (p). The response is a confirmation (ok) or a denial (^ok), 
depending on whether the voter had voted before. The second part specihes the re- 
quests for the score which are responded by the number of votes for (/) and against 
(a) the proposition. All labels are extended with an identifier (s) for uniquely iden- 
tifying the sessions. 

- The set of states S is defined as S' = 7^(L?) x V(V) x V(V) x 7^(N x N x N). 
For (R, F, A, C) G S, 

• R C L? is the set of requests on which no response has been sent yet; 

• F CV is the set of voters who voted for the proposition; 

• A CV is the set of voters who voted against the proposition; 

• CCNxNxNis the score at the moment that a request for the score is sent. 
We need to keep track of this score for determining the possible results that can 
be responded: The scores returned should be at least the scores at the time of 
the sending of the request. 

For (s, /, a, ) G C, 

* s G N is the session identifier; 

* / G N is the number of voters who voted for the proposition; 

* a G N is the number of voters who voted against the proposition. 

- Let s G N, u G R and p G P. Then, transition relation — > is defined by the following 
derivation rules. 

If no session exists with identiher s, a session can be started by sending a request 
to vote for or against the proposition or by sending a request for the current score: 

score?* ^ R, vote(w, g)?* G R 

{R,F,A,C) (Ru{vote(u,p)?*},R,A,C) 

score?* ^ R, ^3„gy3ggp vote(w, g)?* G R 
{R,F,A,C) (RU{score?*},F,A,CU{(s,|F|,|A|)}) 




218 



Harm M.A. van Beek and Sjouke Mauw 



If a request to vote for or against the proposition has been sent and the voter has not 
voted before, the vote can be processed and confirmed: 

vote(f , for)?s G R, v ^ FU A 
{R, F, A, C) ^ {R \ {vote(v, for)?,}, F U {?;}, A, C) 

vote(v, against)?, G if, v ^ F U A 
(if, F, A, C) (if \ {vote(?;, against)?,}, F,A\J {?;}, C) 

If a request to vote has been sent and the voter has already voted before or the 
voter is concurrently sending a request to vote in another session, the vote can be 
rejected: 

vote(v,p)?, G if, 
vote(n, q)u G R\J v G F \J A 

(R,F,A,C) {R\{vote{v,phs},F,A,C) 

If a request for the score has been sent, the scores can be sent to the requesting 
client. Since interactions can overtake each other, the result can be any of the scores 
between the sending of the request and the receiving of the response. So, the score 
must be at least the score at the moment of requesting the score and at most the 
number of processed votes plus the number of correct votes, sent in between re- 
questing for the score and receiving the score: 

score?, G R, (s, /, a) G C, 

r < |F| + (#„gy 3(gN vote(n, for)?^ G R f\v ^ F \J A), 
a< a' <\A\ + vote(w, against)?* GR/\v^F\JA) 

(R,F,A,C) (if \ {score?,}, F,AC\{(s,/,a)}) 

- Initial state sq = (0, 0, 0, 0): no requests have been sent yet, no one has voted for 
and no one has voted against the proposition. 

Labelled transition systems suit nicely for giving a theoretical framework for auto- 
matic conformance testing. However, as expected, using LTSs for giving specifications 
of Internet applications is not convenient. To make this framework useful in practice, 
we need a formalism for easier specifying these applications. Therefore, we are cur- 
rently developing DiCons [15, 16], which is a formal specification language dedicated 
to the domain of Internet applications. We will not give the actual DiCons specification 
here since this goes beyond the scope of this paper. However, to give an example, the 
Internet vote described above can be specihed in DiCons in five lines of code. 

As a proof of concept, we implemented an on-the-fly version of Algorithm 1 . We 
used this algorithm to test eleven implementations of the Internet vote application: one 
correct and ten incorrect implementations. We tested by executing 26.000 test cases 
per implementation. This took approximately half a day per implementation. We tested 
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Table 1. Test results 



implementation 


% failures 


verdict 


1 . correct implementation 


0.00 


pass 


2. no synchr.: first calculate results, then remove voter 


33.30 


fail 


3. no synchr.: first remove voter, then calculate results 


32.12 


fail 


4. votes are incorrectly initialised 


91.09 


fail 


5. votes for and against are mixed up 


87.45 


fail 


6. votes by voter 0 are not counted 


32.94 


fail 


7. voter 0 cannot vote 


91.81 


fail 


8. unknown voter can vote 


0.00 


pass 


9. voters can vote more than once 


68.75 


fail 


10. voter 0 is allowed to vote twice 


16.07 


fail 


1 1 . last vote is counted twice 


8.82 


fail 



using different lengths of test traces and different numbers of voters. The test results are 
summarised in Table 1. The left column describes the error in the implementation. In 
the second column, the percentage of test cases that ended in a fail state is given. 

As can be seen, in nine out of ten incorrect implementations, errors are detected. In 
all test cases, only requests are sent that are part of the specihcation, i.e., only requests 
for votes by known voters are sent. Because we did not specify that unknown voters are 
forbidden to vote, errors in the implementation that allow other persons to vote are not 
detected: the implementation conforms to the specihcation. 

The percentages in Table 1 strongly depend on the numbers of voters and lengths of 
the test traces. Some errors can easily be detected by examining the scores, e.g. incorrect 
initialisation. This error can be detected by traces of length 2: request for the score and 
inspect the corresponding response. Other errors, however, depend on the number of 
voters. If the last vote is counted twice, all voters have to vote hrst, after which the 
scores have to be inspected. This error can only be detected by executing test traces 
with at least a length of two times the number of voters plus two. 

6 Related Work 

Automatic test derivation and execution based on a formal model has been an active 
topic of research for more than a decade. This research led to the development of a 
number of general purpose black box test engines. However, the domain of Internet 
applications implies some extra structure on the interacting behaviour of the implemen- 
tation which enforces the adaptation of some of the key definitions involved. Therefore, 
our work can be seen as an extension to and adaptation of the formal testing framework 
as introduced in [7-9]. The major difference stems from our choice to model an Internet 
application as a multi request-response transition system. We expect that existing tools 
(such as TorX [6]) can be easily adapted to this new setting. The reader may want to 
consult [3] for an overview of other formal approaches and testing techniques. 

Approaching the problem of testing Internet applications from another angle, one 
encounters methodologies and tools based on capture/replay (see e.g. [17, 18]). In the 
case of capture/replay testing, test cases are produced manually and recorded once, after 
which they can be applied to (various) implementations. These tools prove very ben- 
eficial for instance for regression testing. However, automatic generation of test cases 
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has several advantages. In general it proves to be a more flexible approach, yielding test 
suites that are better maintainable and more complete and test suites can be generated 
quicker (and thus cheaper). The main disadvantage of automatic black box testing is 
that it requires a formal model of the implementation under test. 

A methodology that comes very close to ours is developed by Ricca and 
Tonella [19]. The starting point of their semi-automatic test strategy is a UML specifica- 
tion of a web application. This specification is manually crafted, possibly supported by 
re-engineering tools that help in modelling existing applications. Phrased in our terms, 
Ricca and Tonella consider RRTSs as their input format (which they call path expres- 
sions). We perform black-box testing, whereas they consider white-box testing. This 
implies that their approach considers implementation details (such as cookies), while 
we only look at the observable behaviour. White-box testing implies a focus on test 
criteria, instead of a complete testing algorithm. Finally, we mention the difference in 
user involvement. In our approach the user has two tasks, viz. building an abstract spec- 
ification and instantiating the test adapter which relates abstract test events to concrete 
HTTP-events. In their approach the user makes a UML model, produces tests and in- 
terprets the output of the implementation. For all of this, appropriate tool support is 
developed, but the process is not automatic. In this way derivation and execution of a 
test suite consisting of a few dozens of tests takes a full day, whereas our on-the-fly ap- 
proach supports many thousands of test cases being generated, executed and interpreted 
in less time. 

Jia and Liu[20] propose a testing methodology which resembles Ricca and Tonella’s 
in many respects, so the differences with our work are roughly the same. Their focus 
is on the specification of test cases (by hand), while our approach consists of the gen- 
eration of test cases from a specification of the intended application’s behaviour. Their 
approach does not support on-the-fly test generation and execution. Like Ricca and 
Tonella, their model is equivalent to RRTSs which makes it impossible to test parallel 
sessions (or users) that share data. 

Wu and Offutt [21] introduce a model for describing the behaviour of Web ap- 
plications, which can be compared to the DiCons language. In contrast to the model 
presented in this paper, their model supports the usage of special buttons that are avail- 
able in most Web browsers. The main difference with our model is that they focus on 
stateless applications, i.e., responses only depend on the preceding request. We model 
stateful applications which are based on parallelly executed sessions. 

Another functional testing methodology is presented by Niese, Margaria and Stef- 
fen in [22] . Where we focus on modelling Internet applications only, they model other 
subsystems in the system under test as well. In their approach, test cases are not gen- 
erated automatically, but designed by hand using dedicated tools. Test execution takes 
place automatically via a set of cooperating subsystem- specific test tools, controlled by 
a so-called test coordinator. 

Our research focuses on conformance testing only. Many other properties are im- 
portant for the correct functioning of web applications, such as performance, user in- 
teraction and link correctness [23]. Testing such properties is essentially different from 
conformance testing. They focus on how well applications behave instead of what they 
do. Plenty of tools are available for performance testing, e.g., [24, 25]. 
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7 Conclusion 

The research reported on in this paper is conducted in the context of the DiCons project 
(see [15, 16]). The goal of this project is the application of formal methods (especially 
process algebra) to the application of dependable Internet applications. One of the re- 
sults of this project is the development of the DiCons language, which is targeted to the 
specification of the interaction behaviour of Internet applications. The DiCons compiler 
allows for the generation of stand-alone Internet applications. 

Due to the focus of DiCons on interaction, rather than on presentation, it is likely 
that developers will prefer to use a less formal approach that supports the need for a nice 
user interface. However, our current research shows that development of a formal inter- 
action model, like in DiCons, still has benefits. Our research shows that there is a point 
in making a formal model, even if it is not used to generate Internet applications, since 
a formal model can be used for (automated) conformance testing of the application. 

The input of the testing process described in this paper is a multi request-response 
transition system which is a theoretically simple model, but which is very hard to use 
in practice for the specification of real applications. Since DiCons is targeted to spec- 
ify Internet applications and since its operational semantics is an MRRTS, we plan to 
connect the DiCons execution engine to our prototype testing tool. 

As the development of a formal model of an Internet application is quite an in- 
vestment, we expect that only in cases where it is vital that the application shows the 
correct interaction behaviour automated formal testing will be applied. However, there 
will be a huge gain in reliability and maintainability of the application (e.g. because of 
automated regression testing), compared with e.g. capture and replay techniques. 

Although we have only built a simple prototype, we can already conclude that the 
proposed testing approach works in practice, since it quickly revealed (planted) errors 
in erroneous implementations. Interestingly enough, playing with the prototype made it 
clear that the response times in the HTTP-protocol are much slower than in traditional 
window based applications, resulting in less test runs per time unit. We cannot foresee 
if the unreliability of an Internet connection will prevent us from executing lengthy test 
runs over the Internet. 

An interesting point is that the actual HTTP-response of an Internet application has 
to be matched against the expected abstract event from the specification. In our current 
prototype tool we simply scan for the occurrence of certain strings, but this does not 
seem to be a safe and generic approach. Future research should answer the question of 
how to match actual HTTP-replies against abstract events. 
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Abstract. Dynamic interactions between a group of objects, for the realization 
of a use case or a complex operation of an object, may be specified by using 
UML collaboration diagrams. Collaboration defines the roles a group of objects 
play when performing a particular task and several aspects of the control related 
to their interactions. The specification described in a collaboration diagram 
must be preserved during the transformation process into an implementation. 
Test generation based upon collaboration diagrams is actually a poorly ex- 
ploited approach. The testing methodology (generation and verification) pro- 
posed in this paper is based on the dynamic interactions between objects and 
takes into account several aspects related to their control. It supports an incre- 
mental verification of the implementation of the use cases. The generated se- 
quences correspond to the different scenarios of the use cases, expected during 
the analysis and design phases. They allow verifying whether the implementa- 
tion of each use case is in accordance with the corresponding specification. We 
also present in this paper a brief summary of the environment that we developed 
for supporting the proposed use case driven testing process. 



1 Introduction 

Object technology has been widely used during the last decade, especially in the in- 
dustrial areas. Actual industrial systems require the development of increasingly com- 
plex software. Reliability is amongst their most important quality characteristics. 
Undetected defects may result in important consequences not only as to their quality, 
but also as to their development and maintenance costs. Software testing is recognized 
as a vital part of their development process. It represents an important action in their 
quality assurance [14]. Object-Oriented Testing addresses important questions that are 
not covered by the standard procedural testing approaches. These questions concern 
those aspects of software testing that arise specifically in relation to the concepts 
introduced by the object paradigm. An extensive body of literature exists to address 
the issues associated with testing object-oriented systems. Object-oriented systems 
can be tested at different levels. The basic unit for object-oriented systems is the class. 
Many authors have addressed several aspects of the unit testing of classes by consid- 
ering different approaches such as white-box, black-box and state-based testing tech- 
niques. The large number of works related to object-oriented testing, as stated for 
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instance in [5, 11, 12] amongst several publications, have allowed giving many inter- 
esting responses taking into account the object paradigm concepts. However, relative 
little work has heen conducted at the integration and system levels. 

In object-oriented software, objects interact in order to implement the behavior. 
The dynamic interactions between a group of objects, for the realization of a system 
functionality (use case) or a complex operation of an object, may be specified by 
using UML collaboration diagrams. Collaboration defines the roles a group of objects 
play when performing a particular task and several aspects of the control related to 
their interactions. The specification described in a collaboration diagram must be 
preserved during the transformation process into an implementation. Consequently, 
just as we can envisage refining the specification into an implementation, we can also 
envisage using this specification for generating test sequences. Test generation based 
upon a specification described into a collaboration diagram represents an interesting 
way, which is actually poorly exploited as stated by A. Abdurazik and J. Offutt in [1]. 
These authors have also discussed the several advantages that such approach presents. 

The testing methodology (generation and verification) proposed in this paper is a 
new approach. It is based on the dynamic interactions between objects and takes into 
account several aspects related to their control. It supports an incremental verification 
of the implementation of the use cases. The generated sequences correspond to the 
different scenarios of the use cases, expected during the analysis and design phases. 
They allow verifying whether the implementation of each use case is in accordance 
with the corresponding specification. Use cases have been universally adopted for 
requirements specification as stated in [8]. Use cases start in requirements, are trans- 
lated into collaborations in analysis and design phases, and support test cases design 
in test. 

The rest of the paper is organized as follows: section 2 presents the methodology 
of our approach and the main phases of the proposed testing process. Section 3 pre- 
sents UML collaboration diagrams, some extensions that we propose to better specify 
the interactions between objects and the formal description of collaborations retained 
in the framework of our process. The main steps of the test sequences generation 
technique are presented in section 4. Section 5 gives a summary of the supported 
verification process. Section 6 presents the architecture of the environment that we 
developed for supporting our approach and illustrates it on a sample example. Section 
7 gives some conclusions and perspectives of the present work. 



2 Testing Process: Approach Methodology 

The methodology of the proposed object-oriented testing process is illustrated in 
Figure 1. The testing process is divided in two main phases. The test sequences gen- 
eration technique, first main phase, constitutes a complement of our previous work, 
which was based on the individual behavior of objects [3]. It represents a refinement 
and an extension of the technique proposed in [4]. It is based on a formal description 
of objects groups’ behavior including the concept of message post condition related to 
the interactions between objects. This concept, in particular, gives a solid basis for the 
verification process, second main phase of the proposed process. Each generated 
sequence corresponds to a particular scenario of the considered use case. 
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Fig. 1. Testing Process Methodology 

The generated sequences allow verifying, during the testing process, whether the 
implementation of each use case is in accordance with the corresponding specifica- 
tion. These sequences take into consideration not only the dynamic interactions be- 
tween objects, but several aspects related to their control as well (preconditions, post 
conditions, sequencing, control structures related to the interactions, etc.). The verifi- 
cation process is based on some extracted information during the analysis phase, such 
as messages post conditions, from the collaboration description and integrated auto- 
matically, during the instrumentation phase, to the code of the software under test. 
The generated sequences are executed incrementally. 

Abdurazik et al. [1] have stated that test sequences can be generated based on the 
sorting of the interactions within collaboration. This work constitutes, in our opinion, 
an interesting starting point in this area. However, such a technique is certainly appli- 
cable in the case of simple collaboration, in which the sequence of messages is rather 
linear. In the case of complex collaborations, there might be, depending on the speci- 
fied control, several possible sequences for a unique use case. The generated se- 
quences have to allow testing the entire possible cases described in collaboration 
(basic flow and its alternatives). Hence, the sequences have to take into account the 
different aspects of the control and conditions leading to the interactions between 
objects involved in the collaboration. 



3 Collaboration Diagrams 

The different UML notations [6] allow specifying several aspects of object-oriented 
systems. Individual behavior of objects is described in the state charts. Dynamic in- 
teractions between objects are specified in collaboration diagrams. However, UML 




226 Mourad Badri, Linda Badri, and Marius Naha 



collaboration diagrams, in their actual version [13], exhibit in our opinion a weak 
insufficiency. Indeed, they allow specifying many important aspects in collaboration 
within a group of objects, such as the order of messages, the conditions related to 
their execution, and some aspects related to the control in the collaboration. However, 
they do not allow specifying precisely all of the control structures related to the inter- 
actions. A deterministic and a complete description of the control structures are nec- 
essary to get the different possible scenarios in collaboration. This description will be 
taken as a basis for the test sequences generation process. 

We propose some simple extensions illustrated in the example given by figure 2. 
They essentially allow a better expression of iteration according to a given condition. 
The two messages msg6() and msg7() are executed in the same iteration, which is a 
«repeat» iteration. In terms of control, hence messages sequences, this iteration is 
different from a «while» iteration as illustrated in figure 3. The distinction between 
the different types of iterations is performed by a code (F: for iteration, R: repeat 
iteration and W: while iteration), in replacement of the code <*> used in UML. Fur- 
thermore, since both messages msg6() and msg7() (Figure 2) are executed within the 
same loop, we just add a number to identify the actual loop (1 in the illustrated exam- 
ple), right after the iteration type. 




Fig. 2. Example of a collaboration diagram 



These simple extensions, which to our knowledge do not exist in an explicit man- 
ner in the actual formalism of UML collaboration diagrams [13], allow a precise ex- 
pression of the different control flow structures and conditions leading to the interac- 
tions. Moreover, the formal verification process is based on message pre and post 
conditions as illustrated in figure 2 (msg2() and msg6()). The post conditions are 
integrated in the code of the application during the instrumentation phase. 

Collaboration diagrams are described, at the actual stage of our work, by using the 
CDL (Collaboration Diagrams Description Language) language that we have devel- 
oped. CDL is a simple language allowing textual description of the semantic content 
of a collaboration diagram. This gives a solid basis for the testing process, as we will 
see it in the following sections. In its actual version, CDL allows essentially the de- 
scription of high-level information contained in the specification of collaboration 
between objects (message starting the collaboration, different interactions, interac- 
tions order, control structures and the conditions leading to the interactions, post 
conditions, etc.). In fact, CDL represents a complementary tool to UML and 
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OCL [16]. CDL description represents a strong basis for the test sequences generation 
process. Figure 4 gives an example of a CDL description. Moreover, CDL descrip- 
tions will support the verification process of the compliance of a certain part of the 
obtained results, versus the expected results (based on the specified post conditions). 
The CDL language allows also performing, during the design phase, several static 
semantic controls. 



Parlial Collaboration Diagram 
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Fig. 3. While and Repeat Sequences 



Message msg2( ) 

Sequence expression : 1 a 
Pre_condition : |Condl| 
Post condition : [PCondl] 
Return value : 

Arguments : 

Receiver Objcct : B 
Sender_object : A 



Message nisg3() 

Sequence expression : I a.2 \\ I 
Pre_condition : |C'ond2| 

Post condition : 

Return value : 

Arguments : 

Receiver Objcct : G 
Sender_object : B 



Fig. 4. An example of CDL description 

The main objective of our work was essentially to develop the main phases of the 
testing process. It is for this raison that we have developed a simple language (CDL) 
for allowing us to describe and analyze the collaboration diagrams. Formal notations 
such as Z [15] and VDM [9] are certainly rich notations. However, they suffer from 
the problem that they are very costly to introduce into software development envi- 
ronments, as is the case with most formal methods. Moreover, these notations are not 
adapted to support the complete description of collaboration diagrams. Notation such 
as OCL [16], which was created for the distinct purpose of navigating UML models, 
is ideal for describing constraints and expressing predicates when a system is mod- 
eled using UML. Object-Z [7] presents, in our opinion, a good opportunity for de- 
scribing formally the collaboration diagrams as stated in [2]. We plan, in our future 
work, to examine the possibility to introduce notation such as OCL or Object-Z. This 
will improve the developed testing process by giving it a more solid basis. 





228 Mourad Badri, Linda Badri, and Marius Naha 



4 Test Sequences Generation Process 

The test sequences generation technique takes into account the several aspects of the 
control described in a collaboration diagram and related to the interactions between 
objects. The main objective of the adopted approach is the generation of the set of 
theoretical paths, from a given collaboration diagram, starting from the initialization 
of the collaboration to the end, while considering the nature of the interactions 
(conditional, unconditional, iterative, messages sequences, exclusion between 
messages, etc.). Actually, every path corresponds to a particular execution of the use 
case and will be the subject to a particular test sequence. The generated sequences 
will allow verifying whether the implementation of a use case is in accordance with 
the corresponding specification. The test sequences generation technique is organized 
in several steps. 




Fig. 5. Messages control flow graph 



4.1 Messages Control Flow Graphs 

The objective of this stage is to perform an analysis of the CDL description of a col- 
laboration diagram, in order to construct a synthesis of the algorithms of the opera- 
tions involved in the collaboration. These algorithms will allow the construction of 
control flow graphs, restricted to the messages of each operation. Control flow graphs 
have been used in several conventional structural testing techniques. Moreover, they 
provide a global view of the control in collaboration. Figure 5, presents a synthesis of 
the algorithm and the messages control flow graph of the operation (msgl()) that 
starts the collaboration described in Figure 2. 

4.2 Messages Tree 

During this step we perform, first of all, an analysis of the different control flow 
graphs corresponding to the operations implied in the collaboration. This analysis 
allows generating the main messages sequence corresponding to the analyzed dia- 
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gram. This sequence will be taken as a basis for a complete messages tree construc- 
tion process. The starting point of the tree will be represented by the entry point of 
the operation starting the collaboration. Figure 6 shows the main messages sequence 
corresponding to the operation msgl(). 



A.msglO, ( B.msg2() / C.msg3() ), D.msg6(), E.msg7(), { D.msg6(), E.msg7() ) 



Fig. 6. Main Sequences 




Fig. 7. Messages Tree 



We use several notations in order to express the different possibilities in the se- 
quences depending on the control. Notation {sequence}, expresses 0 or multiple exe- 
cutions of the sequence. Notation {sequence 1 / sequence 2) expresses an alternative 
between either sequence 1 or sequence 2. Notation [sequence] refers to the fact that 
the sequence of interest may either be executed or not. The main messages sequence, 
which corresponds to the operation that starts the collaboration, is considered as a 
basis to messages tree construction process. Each message of the sequence will be 
replaced by its own main sequence. The substitution process will stop at the level of 
the messages that constitute the leaves of the tree. These latter correspond to opera- 
tions, which do not call any other operation in the collaboration. Figure 7 illustrates 
the messages tree that corresponds to the main sequence presented in Figure 6. 



4.3 Test Sequences 

The proposed technique consists on generating, from the messages tree corresponding 
to a collaboration diagram, the set of theoretical paths, starting from the collaboration 
initialization to the end, while taking conditions (pre conditions and post conditions) 
into consideration. A particular test sequence will correspond to each generated path. 
The generated sequences correspond to the different possible cases of implemented 
functionality execution. The sequences generated from the messages tree correspond 
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to the entire theoretical paths of the tree, from the root to the leaves. At this point, the 
objective is to identify among the set of possible theoretical paths T, corresponding to 
a collaboration diagram CD an interesting set of executable paths T^, by the applica- 
tion of some reduction rules. The set T^of the paths obtained after reduction will be 
used as a basis for generating the set S of test sequences. To each path of T,, will 
correspond a test sequence from S, which represents a particular scenario of the use 
case. The retained rules at the actual point of our technique are: 

Infeasible paths: The predicates analysis allows determining conflicting predicates 
in certain generated paths. These paths cannot be executed and will be reduced from 
the set T. 

Reduction of paths including cycles: The collaboration diagrams including iterative 
interactions may present a large number (infinite in some cases) of paths. For obvious 
reasons, we cannot test all of these paths. We do consider through, the hypothesis that 
these paths constitute a family of similar paths (iterative part of the sequence) and 
testing only one path (unique iteration) will be sufficient. This hypothesis enables a 
considerable reduction in the number of potential paths. Figure 8 shows the sequences 
generated from the messages tree of Figure 7 after reduction. 



A.msglO, B.msg2(), D.msg6(), E.msgV(), D.msg6(), E.msgVQ 
A.msglO, C.msg3(), D.msg6(), E.msgV(), D.msg6(), E.msgVQ 
A.msglQ, B.msg2(), D.msg6(), E.msg7() 

A.msglO, C.msg30, D.msg60, E.msgTQ 



Fig. 8. Generated sequences 



5 Verification Process 

5.1 Testing Process: Main Phases 

The testing process presented in this paper is incremental. It is organized in several 
phases (Figure 9). The objective is to execute, for each use case, at least once all the 
retained sequences and verify their execution accordance to the corresponding speci- 
fication. The idea is also to verify on executing the use case that the executed se- 
quence of messages corresponds to the expected one, according to the provided input 
data. 

The verification process is supported partly by the instrumentation of the program 
whose aim is adding to the code some operations that allow tracking, by way of the 
execution analysis, whether the sequence was correctly executed or not. Any devia- 
tion of the executed sequence from the expected one will be considered as a failure. 
The verification process is also supported by some extracted information (essentially 
the post conditions in the actual version of the environment) from collaboration de- 
scription and integrated as private methods in the receiver class, during the instru- 
mentation phase, to the code of the application. The user intervenes (Figure 10), of 
course, during this process to verify the results of the use case execution, which are 
not specified in the post conditions. 
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For each use case described by a collaboralion diagram 
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Generation of the corresponding test sequences 

For each sequence S, 

( 

I 

Defining input data and the expected results 
nxecution of the sollware 
Checking the executed messages sequences path 
Cheeking the results 

Reduction of the tested sequence (tcstina coveraae) 

1 

festing coverage: I'cstcd use cases. 



Fig. 9. Testing process phases 
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Fig. 10. Environment architecture 
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5.2 Testing Criteria 

The generated sequences are executed incrementally. This process will allow, on the 
basis of several executions tracking analysis, determining the executed sequences 
(sequence coverage: main scenario and its different extensions), hence the sequences, 
which remain to be executed. We have defined, at the actual stage of our research, 
two types of testing criteria and the corresponding coverage: 

Interactions Between Methods'. Each interaction, in the collaboration diagram, must 
be executed at least once. The interactions coverage (IC) is defined as: IC = Number 
of executed interactions / Total number of interactions in the collaboration diagram. 

Message Sequence Path'. Each retained sequence must be executed at least once. 
Each sequence corresponds to an implemented messages sequence path. The se- 
quences coverage (SC) is then defined as: SC = Number of executed sequences / 
Number of retained sequences. 



6 Environment Supporting Our Approach 

We believe that the environment that we have developed represents an interesting 
framework for the validation of the use cases of a system. It is composed of several 
tools as illustrated in Figure 10. Let us consider the example of collaboration diagram 
given in figure 11. This example was taken from [10] and adapted for the needs of our 
paper. Figure 12 illustrates the result of the analysis of the corresponding CDL de- 
scription and the generated sequences. 




Fig. 11. Example of collaboration diagram 
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Fig. 12. A CDL description and the corresponding generated sequences 
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Fig. 13. Verification process: Sequences tracking 

Figure 13 illustrates the result of the verification process during the execution of 
the selected sequences. Figure 13-3 presents a case where the environment indicates 
to the tester that the executed sequence is in accordance with the expected one. Fig- 
ure 13-1 presents a case where the post condition related to a message of the sequence 
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is not valid. The tester, in such cases, will be informed about the detected failure. 
Moreover, knowing that there will be an error exception, the tester will exactly know 
what part of the tested sequence is in failure. This will be useful for debugging the 
program. Figure 13-2 gives a case where the executed sequence is not in accordance 
with the expected one. The environment will stop the execution at the level of the 
first deviation of the sequence from the expected one. This also will be useful during 
the debugging process. 



7 Conclusions and Future Work 

The object-oriented testing process proposed in this paper is a new approach. It is 
based on the dynamic interactions between the different groups of objects, specified 
in the collaboration diagrams. It offers many advantages in comparison with a code- 
based generation approach. This allows emphasizing eventual omissions in regard 
with the specification, on the one hand, and design errors related to the different in- 
teractions between objects, on the other hand. In addition, test sequences generation 
early in the development process allows a better preparation of the testing process. 

The test sequences generation technique proposed in the present article is based on 
formal description of objects groups’ behavior, particularly the concept of message 
post condition. This gives a solid basis for the verification process. This technique 
allows, for each use case (or a complex operation of an object), from the analysis of 
the formal description of the corresponding collaboration diagram, the generation of a 
set of appropriate test sequences. The test sequences of interest, take into considera- 
tion not only the dynamic interactions between objects, but also the different aspects 
related to their control. Each sequence corresponds to a particular scenario of the use 
case, expected during the analysis and design phases. The objective is to cover, using 
the generated sequences, a large number of the different possible executions of a use 
case and check, thanks to the supported verification process, the accordance between 
its implementation and its specification. 

The developed environment for supporting our approach has been experimented on 
simple Java projects. We plan, in our future work, to use it on real projects. We plan 
also to extend our approach by introducing formal notation such as OCL or Object-Z, 
in the one hand, and to extend the environment to others object-oriented languages 
such as C+-I-. 
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Abstract. We first describe the work on validating interoperable dis- 
tributed software and systems - VISWAS method [1], which extends UML 
models to include testability aspects with design by contract notions. In 
VISWAS, automated test sequence generation is produced from the ex- 
tended Message Sequence Charts (MSCs) and Live State Charts with 
temporal action propagation list. Since testability is not independent of 
the diagnostic process, diagnosability was not explicitly stated as part of 
the SDLC and testing process in VISWAS. Next, we present our current 
work on capturing diagnosis flows in MSCs, and some discussion on why 
standards such as IEEE Std P1522, UML 2.0 Testing Profile, MSC2000 
and TTCN-3 are useful for consideration in capturing diagnosability as- 
pects in a testing environment. 



1 Introduction 

For validation testing in an 0-0 context, the testing strategy is broadened to in- 
corporate the review of analysis and design models. UML models and technology 
have primarily been used in defining static system structures and dynamic be- 
haviours. Before OMG RFP on a UML testing profile (refer: http: //www. fokus 
.gmd.de/u2tp) in July 2001, UML officially provided only some limited mech- 
anisms for describing test procedures, although UML was extended to cater for 
such tasks by various research groups and industry-based projects. With the 
recent trend in systems engineeering approach with model driven architectures 
and automatic code generation, the need for conformance testing has increased. 

1.1 Design for Testability 

Testability and diagnosability are related attributes that must be taken into ac- 
count when building, measuring and predicting the correctness and robustness 
of software components. Testability is viewed as a software design quality char- 
acteristic which allows the status of an item under test to be determined and 
the detection of faults within the item under test to be performed effectively. 
Diagnosability is a wider notion than testability and encompasses finding infor- 
mation or explanation about the state of a system under test and includes faults 
and no faults. 



A. Petrenko and A. Ulrich (Eds.): FATES 2003, LNCS 2931, pp. 236-251, 2004. 
© Springer- Verlag Berlin Heidelberg 2004 
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Software testability is defined in the IEEE Standard Glossary of Software 
Engineering Terminology - IEEE Std. 610.12 - 1990 as “(1) the degree to which 
a system or component facilitates the establishment of test criteria and the 
performance of tests to determine whether those criteria have been met, and (2) 
the degree to which a requirement is stated in terms that permit establishment 
of test criteria and performance of tests to determine whether those criteria have 
been met” [2]. According to this definition, one needs to have appropriate test 
criteria to determine the degree of testability. Hence, testability is a measure 
of the difficulty in satisfying a specific testing goal. Testing can reveal faults 
whereas testability can suggest places where faults can hide from testing. Voas 
[2] defines “testability” of a program P to be the probability that a particular 
testing strategy will force failures to be observable if faults were to exist and 
the probability that the oracle will be precise enough to detect failures. This 
approach is different from the IEEE definition above. 

As software testing is used to show conformance between specification and 
implementation, a good software engineering practice requires that testability 
be addressed by precise and clear specifications and rules to map between var- 
ious phases of software cycle, for example, between implementation and design 
models. 

Testability of 0-0 systems is lower because 0-0 design impacts on the con- 
trollability and observability of objects or components under test. A practical 
engineering method called VISWAS for testing the safety and liveness properties 
of distributed reactive systems using a mainstream method such as UML was 
developed in [1]. In the primary author’s work, testing method called VISWAS 
(Validating interoperable distributed software and systems) [1] attempts to: 

— overcome the limitations of UML/OCL in expressing temporal constraints 
for expressing safety and liveness properties; precisely specify these con- 
straints by expressing these extensions to MSCs and State Charts in the 
Temporal Logic formalism and Temporal Logic of Actions (TLA) 

— integrate MSC and State Charts into the testing process; develop a repeat- 
able model engineering process for a testing method 

— develop an automated specification test generation tool as part of the test 
method. 



1.2 UML2.0 Testing Profile 

UML models focus primarily on system structure and behaviour, and does not 
provide details for specifying test procedures and objectives. In June 2001, an 
CMC REP on UML2.0 Testing profile (UTP) was initiated to address this gap. 
UML 2.0 Testing Profile (UTP) is based on recent work on testing such as 
TTCN-3 [3]. UTP has the notion of an arbiter which is a new test component 
aimed at separating test behaviour from test evaluation. Some of the other major 
terms in UTP are: test architecture, test data and test behaviour. The test 
architecture is a set of related classes and/or components from which test cases 
can be specified. The test data package contains data sent to system under test 
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(SUT) and received from SUT. A test configuration contains the test components 
and the SUT. An arbiter is a specific test component for evaluating test results 
and to assign verdicts of a test case. A verdict is the outcome of a test case being 
pass, fail, inconc or error as defined in TTCN-3. During the execution of 
a test case, a test trace is generated and stored in the test log. 



1.3 IEEE Std P1522 

The Artificial Intelligence Exchange and Service Tie to All Test Environments 
(AI-ESTATE) standards are product information standards for test and diagno- 
sis. The 1232 family of standards were developed for providing standard exchange 
formats and software services for reasoning systems used in system tests and di- 
agnosis. As the information models for these standards become more complex, 
and systems became difficult to diagnose and repair, IEEE Std P1522 initiative 
got under way for standardising testability and diagnosbility metrics. The met- 
rics of P1522 are derived from the information models in IEEE 1232. [4] states 
that the main purpose of testing is diagnosis and that paper provides a detailed 
dynamic context model showing the relationships in a test /diagnosis session. 
P1522 includes both testability and diagnosbility, and diagnosability includes all 
aspects of fault detection, fault localization and fault identification. 

Diagnosability was not the focus of concern in VIS WAS, and the test execu- 
tion phase of test environment in VIS WAS was not automated. We are explor- 
ing the IEEE P1522 standard and UML2.0 Testing Profile for modelling and 
automating the test execution phase of test environment in VIS WAS. 



1.4 From Testability to Diagnosability Aspects 

While testability is concerned with detecting faults, diagnosability aspect of 
testing is about isolating and pointing out fault location to repair/correct them. 
Software testing involves executing code with test input and determining the 
outcome of the test. We look at testing from two complementary objectives: 

— conformance-directed when the intent is to achieve conformance to require- 
ment specification 

— fault-directed when the intent is to reveal faults through failures. We treat 
this as a diagnosis activity. 

The word Diagnosis is derived from two Greek words and is defined as to distin- 
guish, and in a legal sense, to examine and offer an opinion. Our objective is to 
ascertain with confidence the state of the software components under test such 
as correctness, robustness including faults and failures. Testing is considered 
effective when it reveals existing faults. 

We present the VISWAS method in the next sections. In Section 3, we show 
how the test oracle in a test environment (see Figure 1) could be expanded to 
provide detailed diagnostics information. In Section 4, we describe how the di- 
agnosability models for such software systems would benefit from using IEEE 
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Standard P1522, UML2.0 Test Profile, MSC2000 and TTCN-3 Graphical Nota- 
tion. This will also involve capturing diagnosis flow in Figure 3 by using next 
event/signal relations in MSCs ( see Figure 7) and [5]. The diagnosability models 
may start to look cluttered and decidely procedural using UML2.0 Test Profile 
or MSC2000 and TTCN-3 as can be seen from [6,7]. However, in mission-critical 
applications domain, software component market and in embedded software sys- 
tems with hardware/ software, it is important to capture diagnosis flow between 
objects or within an entity graphically in order to communicate our diagnosis 
requirements unambiguously in the detailed design phase. Such diagnostic flows 
and verdict information from the oracle can be used to develop software quality 
measures by collecting and analysing the various diagnosis information. 

2 A Test Method for Validating Interoperable 
Distributed Software and Systems (VISWAS) 

2.1 Defining VISWAS 

In concurrent and distributed modelling, two properties called safety and liveness 
are important to consider. Mutual exclusion and absence of deadlock are impor- 
tant safety properties in developing concurrent models and programs. Progress 
property is a restricted form of liveness property. A progress property asserts 
that from a state that a system is in, it is always the case that a specified ac- 
tion will eventually be executed. A test environment needs to be defined for 
the development of this testing method (VISWAS) for distributed systems. The 
problem considered is the generation of test sequences at a tool level from an O- 
O model representation of the distributed system taking into account assertions 
on time-related constraints. The choice of an 0-0 model, which is testable and 
can be used for testing, is needed to design the test software. The choice of a 
suitable model of the distributed system is required for representing the system 
as a testable dynamic model of the system. In order to reduce the manual effort 
and improve the accuracy of the output, one of the requirements for testing is 
to automate the test sequence generation. 

2.2 A Generic Test Environment 

The test environment for VISWAS is a specific case of the generic test environ- 
ment (see Figure I). 

The test environment includes both the test sequence generation and test 
execution phases. The generic testing environment shown in Figure 1 has three 
main aspects: 

1 . The test model, which is an application model augmented to include testing 
requirements by adding application-specific constraints as part of the object 
or component’s interface protocol, 

2. The test model (state model) is fed to the test design software. This software 
tool uses the critical properties of testing requirements built in the test model 
to generate automated test sequences. 
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Fig. 1. Generic testing environment 

3. The test execution phase is the diagnosis phase and accepts test inputs from 
the previous phase for the system under test, and the test is evaluated as 
pass or fail. The components shown in the test execution phase in Figure 1 
are used in the validation of system under test. 



2.3 Modelling for VISWAS 

The model engineering process in VISWAS is defined by a three layered meta- 
model to facilitate the development of detailed models for implementation and 
testing of concurrent and distributed 0-0 components. 

The three layers of the VISWAS architecture as shown in Figure 2 from the 
model engineering view point are: 

1. Representational layer - to model the given concurrent and distributed sys- 
tem to be tested 

2. Assertional layer - to capture the safety and liveness properties of the given 
system for testing 

3. Tool layer - to develop software testing tools, for example, to automate test 
sequence generation in VISWAS 

The three layered model is used to create a customised process specifically 
suited to the testing requirements of an interoperable distributed software sys- 
tem. 

The representational layer has been chosen to be a distributed architectural 
model, which supports interoperability and address testability concerns of dis- 
tributed software systems. The distributed architectural model of Ken-Gate by 
Schmidt [8] fits this requirement, and is used to represent the architecture of the 
software system. As part of this layer, an implementation model is also derived 
from the static architectural model to represent the distributed software system 
under test. Industry standard tools and languages such as CORE A/ JAVA IDL 
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Fig. 2. Modelling layers in viswas test method 



and Java are used in the implementation model for the software system under 
test. 

The Assertional layer is used to represent the dynamic software model with 
state machines and MSCs. As part of the dynamic model, temporal properties 
must be included explicitly in order to capture the safety and liveness require- 
ments for distributed component testing. However, since the current methods 
such as the UML and OPEN dynamic models do not currently support any tem- 
poral operators, OCL extended with temporal logic operators is used in MSCs, 
and the temporal logic of action (TLA) formalism is applied to the state ma- 
chine. As shown in Figure 2, temporal operators, safety and liveness properties 
of TLA are added to the UML state machine to produce an extended statechart 
with constraints on preceding and succeeding events. These constraints together 
with the extended design by contract from the distributed architectural model 
represent the temporal contracts that are modelled as part of validating an in- 
teroperable distributed system (VISWAS). 

The Tool layer is used to represent the automated VISWAS testing tool, and 
support the testability needs of a distributed software system. The extended live 
state chart with its temporal contracts, and event patterns that take important 
safety properties, such as mutual exclusion, absence of deadlock with always and 
never, and liveness properties such as eventually enabled into account, are fed to 
the tool layer to generate a functional test sequence generator. 
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3 An Automated Specification-Based Testing Tool 

In the development of VISWAS, precise meaning for the various elements in 
the example MSC for a Robot have been provided. It is described briefly in 
Subsection 3.1. More details are available in [9,1]- Subsection 3.2 provides details 
of the Live State Chart, and how it is used in the automated test sequence 
generation. A test sequence generation is one of the components of the testing 
environment. A test environment is presented for an automated specification- 
based testing tool for Distributed Object Software Testing (DOST) in VISWAS 
method in Subsection 3.3. 

3.1 Message Sequence Chart of a Robot Component 

Our MSC for a Robot (see Figure 3) has a clear definition of the mandatory vs 
provisional elements of MSC. It is therefore possible that one can proceed with 
defining the interaction semantics on the horizontal dimension of MSC using 
either OCL or events with guards. Guards may be empty or contain predicates 
which must be met for the interaction to occur. 



Client Server 




Fig. 3. Message sequence chart of a robot component 



We have described MSC object interactions precisely in terms of what , when 
and where they occur. Our Robot example Figure 3 describes a robot drop opera- 
tion scenario. There are six object/thread instances: Client Robot and its three 
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threads - robotMot or 1 , armMotorl, gripperMagnetl, CDRBA (Interaction 
Ruleslmpl) and a Server, each of which is depicted with a line on the vertical 
axis. Each of these instances is represented by a rectangle with its name under- 
lined followed if appropriate by the thread or CORBA specific stub or skeleton 
name. The thread creation messages from the Client life line to the three thread 
objects are shown coming from a coregion. The coregion is drawn as a dashed 
vertical line. The events in the corregion are not ordered whereas events in an ob- 
ject instance is totally ordered along the vertical line. A hexagon crossing over an 
instance represents the condition to be satisfied by the instance during execution. 
For example, robotMotorl thread object must be in the armlPointstoPress 
state at the beginning of this scenario. Shared conditions are represented by 
hexagon crossing over two or more objects. A horizontal arrow means commu- 
nication via message passing between two instances. For example, the message 
rotatel() is passed from threadl to InteractionRulesImpl. The server object is 
shown with a dashed life line to show existentiality. It need not be alive until all 
the thread events of the robot component object rotatelO , extendArml () , 
takeBlcUikO , rotate2() have happened. The server life line is a solid line, as 
it must exist (live) when the client object sends a robotReadyO message to the 
server. 

Two extensions to the MSC notation were introduced: One is a textual an- 
notation - XOR to show mutual exclusion. Beside the annotation is the second 
extension which is a timeline shown as a solid line that stretches between the two 
mutually exclusive methods. This timeline has a cap (see Figure 3) at the top 
and bottom of the line. The cap ensures clarity in the depiction of the scope of 
mutually excluded methods. The aim was also to integrate MSCs and statecharts 
in the development and testing process. The prototype test sequence generator 
tool covers intra component validation testing of distributed reactive systems. 

3.2 Live State Chart and Test Sequence Generation 

The Live State Chart is a visual dynamic UML model, and this model is used in 
the automation of test sequence generation. A textual extended BNF notation 
is used to provide a Grammar for the Live State Chart. The UML metamodel is 
used to provide a textual description for the Live State Chart. The Live State 
Chart of the distributed reactive component. Robot example is transformed into 
an event tree using JavaCC tools. A prototype implementation tool for the au- 
tomated test sequence generation is derived from the event tree of the Live State 
Chart Algorithm developed for automating the generation of test sequences for 
a distributed reactive system is described in [1]. 

As shown in Figure 4, the test sequence generation phase involves the fol- 
lowing: 

— Live State Chart (LSC) with Temporal Action Propagation (T .4P Sequences) 

— Textual Model of LSC with TAV that conforms to the Grammar defined 
here 

— JavaCC tools (JJTree and JavaCC) 

— test sequence generator tool 
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Fig. 4. Test sequences generation phase 




Fig. 5. Live state chart with TAP (Robot Example) 



The essential (core) requirements of the testing environment considered are: 
the specification model to be fed to the JavaCC tools in the context of a Gram- 
mar, creation of a functional test sequence generator tool, and a system under 
test for testing against the input test sequences. 

The rest of the infrastructure such as the test oracle and the output trace 
programs deal with the diagnosability aspects and are required to provide a 
fully functional automated specification tool. However, this test execution phase 
is not automated with this environment (Figure 6) in [1] which is the focus of 
our present work. 

Live State Chart’s event activation sequences and the extensions, introduced 
in the MSC to show mutual exclusive methods in a previous subsection are revis- 
ited here to introduce TAP. Temporal operators with the formalisms from TLA 
are shown explicitly with event /action sequences in the LSC. In order to reduce 
the clutter in the body of the diagram, a temporal action propagation {T AV) 
list was introduced. 'T AV lists the event action propagation sequence for the 
transition with the appropriate temporal operator (as shown in the rectangular 
attachment entitled TAP shown as part of the Live State Chart Figure 5). 
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Test Execution Phase 



Fig. 6. Testing environment 

In T AV in Figure 5, aOel means always 0 eventually 1. These operators 
are always (a) and never (n) for mutual exclusive methods, and hence represent 
safety properties. Although never (n) constraints are not shown in the body of 
the live state chart, they are part of the testing requirements. Hence, never (n) 
constraints are included in the TAP list. The extension to MSC (see Figure 3) 
shows the mutual exclusion condition (always and never) clearly. This informa- 
tion from the MSC has been used to include the never (n) constraints in the TAP 
list. The liveness properties are shown as eventually (e) true (t) or false (F). For 
an Eventually (e) constraint in the live chart, both (t) and (f) are included in 
the TAP list to accommodate eventually true and eventually false conditions. 
The Live State Chart has the temporal action propagation(TMP) sequence list, 
which lists for each activation sequence: 

— a sequence number to indicate the order in which the event occurences are 
considered 

— the event action sequence IDs which match the IDs included in the body of 
the Live State Chart plus the default start event 

~ the true and False activation for an eventually true or false event 

— temporal constraint number 

— temporal property and 

— the transition label. 

From the description of the Live State Chart with T AV for a Robot exam- 
ple, a generalised, abstract model of the Live State Chart with T AV has been 
derived in [1] . This abstract model definition is one of the steps in the setting up 
of a repeatable validation testing method (VISWAS) for a distributed reactive 
component. 

3.3 Testing Environment in VISWAS Method 

Our testing environment in VISWAS supports such an automated process (see 
Figures I and 6), and includes: 
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— the Test sequences Generation Phase, and 

— System Under Test. 

In order to support an automated validation process, the test execution phase 
of the test environment must include the following: 

— The system under test accepts the automated test sequences generated under 
the control of the test oracle program; the test oracle program to run the 
system under test with the test data provided by the test sequence generator 

— The actual results are compared against the expected results and the test 
is deemed as pass or fail and reported in the output trace program; the 
output trace is a reporting mechanism (verdict) by the test oracle of tests 
that passed and those that failed 

The components of the test execution phase are described briefly. 

Test Oracle. A test oracle program verifies the behaviour of the test execution 
by using the oracle data {expected results), together with the test input and 
output {actual results). The accuracy of the behavioural verification depends on 
how good the test oracle is. 

A test oracle program can be built to use the functional test sequences pro- 
duced by the test sequence generator, which is described in the next section. The 
test oracle once built and compiled, can be used to execute against the system 
under test to detect any violations of properties covered by the test sequences. 
A test oracle is used either with an entire test sequence (test suite) or with a 
single test case. The current design (as shown in Figure 6) allows the test oracle 
to compare the outcome with the expected results. Further analysis of the test 
results is conducted by the output trace program. 

Output Trace. The output trace is used to collect the verdict from the test 
oracle. The output trace should receive the input test sequences from the auto- 
mated test sequence generation phase. The test sequences are grouped into valid 
and invalid sequences, and cover all the event intervals. It should also obtain the 
the actual result and the expected result from the test oracle. Hence, the output 
trace also contains coverage metrics for the system under test. The output trace 
mechanism is therefore useful in further analysis of test results. 

4 Capturing Diagnosis Flow in MSC 

We revisit MSCs here to show how Ladkin et al. ’s [10] work described in [1] and 
Padilla’s [5] work on an execution semantics for MSCs can be used in describing 
a set of possible traces which are treated here as test sequences and diagnosis 
flows. 

4.1 MSC to Next Event/Signal (ne/sig) Relation Graphs 

Ladkin et al. [10] use ne/sig graphs as abstract syntactic representation for 
MSCs. ne/sig graphs have two kinds of edges - next event (ne) and signal 
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(sig) edges. These edges represent signals and the progression of processes be- 
tween events. The nodes represent events and are labeled with the event type. 
The event node sending a message of type a at the sig edge is labeled !a, and 
the event node receiving that message of type a labeled ?a. A ne/sig graph has 
start nodes labeled Top and end nodes labeled Bottom (Figure 7). 




source: Ladkin and Leue 1994 (FDT VI Proceedings) 



Fig. 7. An msc and its corresponding ne/sig graph 



MSCs have vertical lines drawn from each object/process/thread. Events are 
ordered temporally from top to bottom. The horizontal lines in the graph rep- 
resent an event at which signals of the specified type are sent or received by the 
object /process/ thread. The system is terminated when all the objects/processes/ 
threads have terminated. To translate an MSC into a ne/sig graph, events 
are drawn as nodes, and the signal edges become relations on the nodes (Fig- 
ure 7). Two types of signal edges are needed to handle both the asynchronous 
and synchronous communication. Therefore, there are two signal relations on 
graph nodes. The state transition relation on these two types of nodes in the 
ne/sig graph is defined differently, and whether an action is enabled, depends on 
whether it is an asynchronous or synchronous action. In synchronous communi- 
cation, both send and receive events occur simultaneously, communication is 
atomic, and both the events block until both are ready. 

MSCs can contain labeled conditions, shown in Figure 8 as a rectangle span- 
ning horizontally across the object/process/thread axis. In this figure, there is 
a condition label C at the top and bottom. MSC may be joined to itself creat- 
ing an iterative non-terminating loop, with signal a alternating with signal b. 
Translating MSCs with conditions into ne/sig graphs involves introducing extra 
condition nodes on each process axis, then joining the graphs at these nodes, 
and unfolding an MSC into a single ne/sig graph (Figure 8). 

The triple, called the global state transition graph (GSTG), with global states 
(Q), the start state (go), and the state transition function (Tm) is defined for 
the ne/sig graph to derive a finite-state automaton. The triple <Sl,w,S2> is 
a transition relation. Say, in the state labeled SI, event of type !a at node w 
is enabled, as node w denotes a send node. Node x is not enabled, as its send 
has not been taken in SI. Since w is enabled, the event corresponding to it may 
be taken next to enter a new state S2. GSTG is annotated with the list of 
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adapted from Ladkin and Leue 1994 (FDT VI Proceedings) 



Fig. 8. Iterations in msc (msc 2) and its corresponding ne/sig graph 
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source: Ladkin and Leue, 1994 (FDT VI Proceedings) 

Fig. 9. Global state transition graph for msc 2 



actions enabled (enO) and taken (taO) in each state (Figure 9). Ladkin 
et al. [10] proposed that MSC specifications may be enhanced by making which 
liveness properties (weak or strong fairness property) must be satisfied for a 
given specification. We used these ideas in defining an extended MSC in [1] for 
deriving automatic test sequences. 

We revisit Figure 7 again here to describe possible diagnostic sequences. For 
each instance, the vertical time axis shows a total order among events called in- 
stance order. There is a one-to-one correspondence between sending and receiv- 
ing of each message signal (relation) represented by arrows as shown in (Figures 7 
and 8)) [5]. The event of sending message !a is related to the receiving event 
?a. An MSC defines a partial ordering of events composed from the instance 
order and send-receive relation, and describes a set of possible test/diagnostic 
sequences. Figure 7 with its explicit representation of send and receive signals 
describes a set of traces that can be used in diagnostic sequences: 
trace 1: !a, a?, !b, ?b; trace 2: !a, a?, !b, !c, ?c 

Using this notation in conjunction with the extended MSC proposed in [1] 
and the UML2.0 test profile provides us with the ability to capture test and 
diagnosis information with UML. 
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5 Discussion 

5.1 Testability and Diagnosability with Industry Standards 

Researchers working on improvement to software quality attributes have pro- 
duced novel solutions to address testability and diagnosability concerns. Software 
slicing and dicing are diagnosis techniques from researchers such as [11,12] focus 
on software code at the unit and integration levels. Another diagnosis method is 
the use of assertions [13] to locate faulty states in code. [14,15,16,17] use diag- 
nosability to measure the expected effort and difficulty in fault localization, and 
help with improving the design quality. A number of these researchers use SDL 
and Testing and Test Control Notation (TTCN-3) notation. 

UML is also a widely used industry standard and for software components 
modelled using UML, it makes sense to use them for test description as well. 
The primary author [1] has used extended MSCs to describe tests for tempo- 
ral properties. A test description language, !TeLa! is introduced in [18] to be 
used as input by the executable test generator and synthesise tests specified in 
UML based formalism. They use UML interaction diagrams to describe tests for 
temporal ordering properties. 

Also, the problems associated with temporal ordering have been explored in 
the telecom domain and in conformance testing of telecom protocols (see ISO 
Standard Testing Language TTCN [19]. TTCN-3 [20,3] is the only standardised 
language for the specification and implementation of test cases. [7,21] describe a 
graphical presentation format for TTCN-3 (GFT). GFT is based on MSCs and 
UML and extends it with test specific concepts such as verdicts and defaults. 
GFT is also the basis for the definition for UML2.0 Test Profile. 

[22,23,24] have described parallel test architectures and have used MSCs and 
activity diagrams to show the testing and diagnosis process. We had (refer to 
http : / /www . esse . monash . edu . au/ sitar/se_educ_pro j / casestudies/ game 
/PACT.html using Netscape) used Parallel architecture for component Test- 
ing (PACT) [22] for building test cases, and PACT architecture parallels the 
architecture of production classes. A test case may have four outcomes: Pass, 
Fail, inconclusive or abort without verdict to cater for apriori exception cases. 
UML2.0 Test Profile, GFT, !TeLa! have similar diagnosis features. 

A number of papers from [25,26,4] point out that in electronic and other 
complex hardware systems, artificial intelligence is employed as a primary com- 
ponent in system test and verification. However this has led to proliferation of 
AI design, test and diagnostic tools, and the lack of standard interfaces between 
reasoning systems has led to increase in product life cycle costs. The primary 
purpose of standardization effort such as the Artificial Intelligence Exchange and 
Services Tie to All Test Environment (AI-ESTATE) for IEEE1232 family and 
the standard on testability and diagnosability metrics, PI 522 is to facilitate the 
development of diagnostic tools and systems that can be widely used and are 
predictable. Integrated diagnostics conceptual model proposed in [27] provides 
direct ties to the testability and diagnosability standard, P1522. The model de- 
picts relationships between test requirements, tests and outcome. The outcome 
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is based on diagnostic rules with which conclusions are drawn with levels of 
confidence. Diagnosis can point to failures or faults and corrective actions are 
undertaken. [27] argues for diagnostic components to be constructed according 
to standards to facilitate competition in the market place in terms of risks, cost 
and quality. Standards also imply a maturity in the underlying technology, thus 
adding to the level of confidence. 

We draw inspiration from such standards work in our ongoing work on de- 
riving design and test models in the context of industry standards such as UML 
models, UML2.0 test profile, TTCN-3, GFT and !TeLa! for test and diagnosis. 

We are currently investigating various avenues for further work in this area. 
Some of these areas are: 

— building an automated test oracle (diagnostics tool) to capture diagnostics 
(pass/fail) status of test inputs (from the automated test sequence genera- 
tor). 

— analysing the diagnostic information to measure the goodness of the design 
and effectiveness of test strategies using the above or existing diagnostic 
tools. 
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Abstract. We present work on a tool environment for model-based testing with 
the Abstract State Machine Language (AsmL). Our environment supports semi- 
automatic parameter generation, call sequence generation and conformance 
testing. We outline the usage of the environment by an example, discuss its un- 
derlying technologies, and report on some applications conducted in the Micro- 
soft environment. 



1 Introduction 

Over the last two decades, the area of formal software modeling has been extensively 
explored, developing various methods, notations and tools. Formal specification lan- 
guages like VDM, Z, B, CSP, ASM etc. have been developed and applied to numer- 
ous problems. Verification technology has had success in certain areas, in particular if 
based on model checking. However, in spite of promising results, a widely expected 
break-through of these technologies has not yet appeared. 

The goal of our group at Microsoft Research is to bring rigorous, formal modeling 
to praxis, trying to avoid (suspected) obstacles of earlier approaches to formal model- 
ing. We have developed the Abstract State Machine Language (AsmL), an executable 
modeling language based on the ASM paradigm [1] and fully integrated into the 
.NET framework and Microsoft development environment. 

One important application we see for AsmL is automated testing. A huge amount 
of work is spent on testing in Microsoft’s and other companies’ product cycle today. 
Models not only enhance understanding what a product is supposed to do and how its 
architecture is designed, but enable one to semi-automatically derive test scenarios at 
an early development stage where coding has not yet finished. Given manually or 
automatically generated test scenarios, formal models can be used to automate the test 
oracle. A great advantage of model-based testing is seen in its adaptability: during the 
product cycle, various versions of the product are published at milestones, each of 
which requires thorough testing. Whereas manual test suites and harnesses are hard to 
adapt to the variations of the product, a model makes this work easier. 

We have developed an integrated tool environment for model-based testing with 
AsmL. This environment comprehends the following technologies: 
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- Parameter generation for providing method calls with parameter sets; 

- FSM generation for deriving a finite state machine from a (potentially infinite) 
abstract state machine; 

- Sequence generation for deriving test sequences from the FSM; 

- Runtime Verification for testing whether an implementation performs conforming 
to the model. 

Our environment realizes a semi-automatic approach, requiring a user to annotate 
models with information for generating parameters and call sequences, and to config- 
ure bindings between model and implementation for conformance testing. This anno- 
tation process is supported by a GUI. The approach is novel, to the best of our knowl- 
edge, in its combination as well as in many of its ingredients. In this paper, we will 
discuss the environment’s methodology and underlying implementation by a walk- 
through of an example. 



2 The Abstract State Machine Language 

Space constraints prevent us from giving a systematic introduction to AsmL; instead 
we rely on the readers’ intuitive understanding of the language as used in the exam- 
ples. AsmL is a fusion of the Abstract State Machine paradigm and the .NET com- 
mon language runtime type system. From a specification language viewpoint, one 
finds the usual concepts of earlier specification languages like VDM or Z. The lan- 
guage has sets, finite mappings and other high level data types with convenient and 
mathematically-oriented notations (e.g., comprehensions). From the .NET integration 
viewpoint, AsmL has all the ingredients of a .NET language, namely interfaces, struc- 
tures, classes, enumerations, methods, delegates, properties and events. The close 
embedding into .NET allows AsmL to interoperate with any other .NET language and 
the framework: AsmL models can call out into frameworks and AsmL models can be 
called and referred to from other .NET languages, up to the level that e.g. an AsmL 
interface (with specification parts) can be implemented by a .NET language, enabling 
checking that the interface contract is obeyed [3]. 

The most unique feature of AsmL is its foundation on Abstract State Machines 
(ASM) [1]. An ASM is a state machine that in each step computes a set of updates of 
the machine's variables. Upon the completion of a step, all updates are "fired" (com- 
mitted) simultaneously; until that happens, updates are not visible, supporting a side- 
effect free view on the computation inside a step. The computation of an update set 
can be complex, and the numbers of updates calculated is not statically bound. Con- 
trol flow of the ASM is described in AsmL in a programmatic, textual way: there are 
constructs for parallel composition, sequencing of steps, non-deterministic (more 
exactly, random) choice, loops, and exceptions. On an exception, all updates are 
rolled back, enabling atomic transactions to be built from many sub-steps. 

AsmL supports meta-modeling which allows a programmatic exploration of the 
non-determinism in the model and dealing with state as a first-class citizen (i.e., the 
current state is accessible as a normal value that can be manipulated just as any other 
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data value). This allows us to realize various state exploration algorithms for AsmL 
models, including explicit state model-checking and in particular test generation and 
test evaluation. 

AsmL documents are given in XML and/or in Word and can be compiled from 
Visual Studio .NET or from Word; the AsmL source is embedded in special 
tags/styles. Conversion between XML and Word (for a well-defined subset of styles) 
is available. This paper is itself a valid AsmL document; it is fed directly into the 
AsmL system for executing the formal parts it contains or for working with the AsmL 
test environment. 



3 Example: Web Shop 



Throughout this paper, we will use as an example a simplified model of a web shop. 
Our web shop allows clients to order gifts like flowers or perfume using the common 
shopping cart metaphor. Real-world details are heavily abstracted in this example to 
make it comprehensible (we should emphasize at this point that our approach scales 
to richer examples; see Sect. 0 for applications in the Microsoft environment). 

The web shop’s items are introduced below: 

enum Item 
Flowers 
Perfume 

const prices as Map of Item to Integer = 

{ Flowers -> 30, Perfume -> 20 } 

A shopping cart is represented as a bag (multi-set) of items: 

type Cart = Bag of Item 

A client to the web shop is described by the class below. A client has an identifier 
and a session state, given by its shopping cart. If the client is not in a session the cart 
is null (The type T? in AsmL denotes a type where null is an allowed value; by de- 
fault, types in AsmL do not contain null): 

class Client 

const id as String 
var cart as Cart? = null 
override ToStringO as String? 
return id 

The state of the web shop model is given by a set of clients: 

var clients as Set of Client = { } 

We now define the actions of the clients. A client can be constructed in which case 
he is added to the set clients ; a client can enter the shop (if he is not in a session), 
can add an item to his cart (if he is in a session), or remove an item (if he is in a ses- 
sion and the item is on his cart). Finally, a client can checkout, obtaining the bill and 
ending his session (In this simplified model, the client can only leave the shop by 
paying): 
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class Client 

Client (id as String) 

require not exists client in clients 
where client. id = id 
add me to clients 
EnterShop ( ) 

require cart = null 
cart := Bag of ItemO 
AddToCart ( item as Item) 
require cart <> null 
cart := cart . Include ( item) 

Remove FromCart ( item as Item) 

require cart <> null and then item in cart 
cart := cart . Exclude ( item) 

Checkout ( ) as Integer 

require cart <> null and then cart. Size > 0 
var bill as Integer = 0 
step foreach item in cart 
bill := bill + prices (item) 

step 

cart := null 
return bill 



4 FSM and Sequence Generation 

The AsmL tool environment allows generating a finite state machine from models 
such as that for the web shop. From the FSM, call sequence can be generated using 
standard techniques [4]. The FSM is generated by exploring the state space of the 
model in a similar way an explicit-state model-checker works [5]. Starting at the 
initial state, enabled actions are fired, leading to a set of successor states, from where 
the exploration is continued. An action hereby is a shared or an instance based 
method; parameters to this method (including the instance object if necessary) are 
provided by a configurable parameter generator (see Sect. 0) An action is enabled if 
the method’s precondition (require) is true in the current state. 

Various ways are available to prune the exploration. Pruning is strictly necessary 
for infinite models (like the one for the web shop where a client could add items 
again and again to the cart). But pruning might be also required for large finite mod- 
els in order to focus on certain test purposes. The AsmL environment provides a col- 
lection of different pruning techniques; the most important are: 

- State abstraction: state abstractions map a concrete state to an abstract state. Ex- 
ploration stops when a state is reached whose abstract equivalent has already been 
seen. 

- Filters: a filter allows excluding certain states from exploration; only those states 
that pass the filter are considered for continuation during exploration. 

- Model coverage: a percentage of model branch coverage can be given; exploration 
stops when this coverage is reached. 

We illustrate the FSM generation for the web shop example. First we have to pro- 
vide suitable definitions for the parameter domains of the actions. The actions of 
interest here are the client constructor and the instance methods of a client for enter- 
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ing a shop, adding and removing items, and checking out. The parameters required 
are identifiers for clients, client objects and items. For the first one we provide a 
given set of names like a, b, c and so on. For the client object domain it is natural to 
actually use the model variable clients itself: it provides in each state the set of 
clients created so far. For the items, finally, we use the domain as given by the enu- 
meration. We discuss the configuration of parameter domains in greater detail in the 
next section. For now it is important to note that the configured parameter domains 
can depend on the dynamic state of the model. Thus, as clients are created, the do- 
main for the instance parameter of client domains, given by the model variable 
clients, grows. 

Once we have configured parameter domains, we define the variables and actions 
of the state machine, and add a so-called abstraction property for pruning the state 
exploration. The state abstraction properties group the concrete states into equiva- 
lence classes; exploration is stopped if we see a concrete state for which an equivalent 
one has been already seen before. 

Finding the right abstraction property is a creative task and requires experience and 
trial and error. If the purpose of the generated FSM is to create scenarios for adding 
and removing two different items by just one client the following property does fine: 

property SomeltemsInCart as Set of (Bag of Item) ? 

get return 

{ (if client. cart <> null then 

client. cart * Bag {Flowers, Perfume} 
else null) | client in clients } 

This property maps the state space of the model into a set of carts, for each client 
one cart in the set; it does not distinguish from which client the cart comes. Each cart 
in turn is pruned to not contain more than one Flowers and one Perfume item (we 
use multi-set intersection for this purpose: for example, {a , a , b} * {a} = {a}). 

Here, we want to further prune to state space by filtering out states with more than 
one client. We use the following filter: 

property AtMostOneClient as Boolean 
get return Size (clients) <=1 

The complete configuration for the web shop is shown in the screenshot in Figure 
1. The domains part of the configuration contains annotations of model elements for 
parameter domains. The state machine part contains annotations for variables and 
actions of the state machine as well as the abstraction property. 

Given this configuration, we generate an FSM as shown in Figure 2. Only one cli- 
ent will be created in this FSM, since our abstraction property does not distinguish 
from which client a cart comes (and hence if a second client enters the shop, no dif- 
ference is seen in the abstract state to the first client). In the FSM, S3 is associated 
with the state where the client’s cart is empty, S4 where the client has Perfume on 
his cart, S5 where he has Flowers on his cart, and S6 where he has both. Among 
these states, various transitions exist, adding and removing items. 

From an FSM as shown we generate test sequences using the well-known FSM 
traversal techniques (we use a variation of the transition tour method based on an 
algorithm from [6]). For the shown FSM we get a single traversal with 19 steps. 
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Fig. 1. Configuration 



The simple example of the web shop can produce much richer FSMs. The follow- 
ing property allows for more items: each client can buy up to two flowers and two 
perfume sets: 

property MoreltemsInCart as Set of (Bag of Item)? 
get let maxitems = Bag{ 

Flowers , Flowers , Perfume , Perfume } 
return { if client. cart <> null 

client. cart * maxitems 
else null 

I client in clients } 

The FSM generated from this abstraction property and a filter that restricts the 
number of clients to 4 consists of around 900 relevant transitions (transitions leading 
to a new state under the abstraction); 6000 transitions have been tried out to find these 
transitions, and the construction time was around 4 minutes with a maximal memory 
footprint of 110 MB. Indeed, such an FSM is not feasible to visualize as a whole; 
however, with the methodology we are proposing one first tries out with a smaller 
abstract state space to understand the abstraction and then scale up parameters for the 
actual generated FSM and test suite. 
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Fig. 2. Generated FSM 



5 Parameter Generation 

The AsmL test environment uses a parameter generator based on access driven filter- 
ing (ADF) which is an enhancement of an existing framework called Korat [7]. ADF 
can generate values of recursive value types and object graphs. Given a predicate and 
a domain configuration, ADF generates all non-isomorphic valid inputs whereby an 
input is regarded as valid if the predicate holds. The domain configuration contains 
descriptions of finite sets as the domains of basic types, and information about how to 
generate objects of class types and elements of value types, and imposes bounds on 
the size of the generated input. The domain configuration classifies the domain of 
each type into one of the following three categories. 

- Defined Domain: A defined domain is given by an arbitrary AsmL expression 
which is evaluated in the scope of the model. It can depend on the dynamic state of 
the model. 

- Inherited Domain: An inherited domain is composed of domains as they are for 
other types. That is, an inherited domain just refers to one or more types, and the 
union of the domains of these types constitutes the inherited domain. A typical ap- 
plication of inherited domains is abstract types. The domain of those types is natu- 
rally the union of the domains of all of its subtypes. 
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- Generated Domain-. The domain of a class or value type will be generated by 
ADF. ADF must be given a domain configuration for each field of the type in one 
of the three ways described here. A bound on the maximal number of ob- 
jects/elements of the type in a single input can be imposed. Finally, each field is 
assigned a cost; all assignments to fields in a given input are summed up to com- 
pute the cost of this input. The predicate is configured to have a maximal cost. 

ADF exhaustively finds all valid inputs which are within the bounds imposed by 
the domain configuration. To this end, ADF considers the parameters of the predicate 
and the fields of generated domains as free variables. ADF executes the predicate 
with an input which initially only consists of the parameters of the predicate as free 
variables. Whenever the execution of the predicate accesses a free variable, then ADF 
will instantiate this variable by choosing an object or value that is allowed by the 
domain configuration (thus the name access driven filtering). If the bounds imposed 
by the domain configuration are exceeded or the predicate returns false, then this 
assignment of the free variables is discarded. Otherwise, if the predicate returns true, 
any instantiations for the remaining free variables can be chosen to create a valid 
input. By exhaustively exploring all choices that are possible when instantiating free 
variables ADF will find all valid inputs within the given bounds. 

Two kinds of bounds are imposed on generated domains by the domain configura- 
tion. 

- A maximal number of objects/elements of a single type: No input will contain 
more objects/different elements of a single type. This is an effective bound if only 
a small number of generated domains are involved. 

- As an extension to Korat, a maximal accumulated cost along field accesses is 
maintained. The intuition behind this bound is that one often wants to generate 
asymmetric inputs which tend to be more complex only in certain areas. In this 
case, one would assign low costs to fields which lead to the desired complex areas, 
and high costs to fields which lead to areas which should not be considered. ADF 
stops the generation of bigger inputs when the accumulated costs exceed a given 
maximal cost. 

As an example of ADF’s usage, suppose our web shop allows inputting search queries 
which are simple boolean expressions over string literals. These can be defined in 
AsmL as below (where an AsmL structure is a value type which allows recursion): 

abstract structure Query 

structure Literal extends Query 
literal as String 

structure Conjunction extends Query 
left as Query 
right as Query 

structure Disjunction extends Query 
left as Query 
right as Query 

Suppose we want to generate those queries as parameter inputs which are in dis- 
junctive normal form. We define a filter predicate as below: 
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queries. astnit - AsmL Test Generator 
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Fig. 3. Domain Definitions 



IsShallowDNF (query as Query) as Boolean 
match query 

q as Conjunction: 

return not q.left is Disjunction and then 
not q. right is Disjunction 
q as Disjunction: 

return IsShallowDNF (q. left) and then 
IsShallowDNF (q. right) 
q as Literal : 

return true 

Since ADF inductively generates input by instantiating free variables in already 
generated input, the shallow DNF test as above is sufficient for generating a tree 
which is in full DNF. 

Our test environment allows annotating the configuration for parameter generation 
with a GUI. Domains can be defined on a per-type base, per-field/parameter base, or 
per-method base. For the query example, the configuration is given in Figure 3. 

The super- type Query inherits its domain from the union of the configuration of its 
sub-types (the full definition is not displayed because of window size). The suh-types 
are generated using ADF, where the recursive fields point back to the configuration 
of the Query super-type. The recursion causes no problem because the input size is 
bounded; we allow 2 instances for literals and conjunctions, and 1 instance for dis- 
junction. The run of the parameter generator results in 76 parameter combinations 
which are of the obvious shape. 
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6 Conformance Testing 

The AsmL test environment allows interactively configuring bindings between a 
model and an implementation, instrumenting the model as a test oracle. The imple- 
mentation can be given as any managed .NET assembly, written in any of the .NET 
languages. A wizard supports the binding of model classes and methods with imple- 
mentation classes and methods by signature matching. 

To enable conformance testing, the implementation assemblies are rewritten on the 
intermediate code level inserting callbacks for monitored methods to the runtime 
verification engine. This engine is able to deal with non-determinism in the model by 
maintaining a set of admissible model behaviors. Each time a monitored method is 
called in the implementation, its parameters and output result will be propagated to 
the conformance test manager. On each of the currently possible model states, the 
according model method will be called. If the method contains nondeterministic be- 
havior, several resulting states can arise. The resulting states of those calls which 
produce a conformant output constitute the set of next model states. If this set be- 
comes empty, the conformance test fails. In addition to comparing just method re- 
sults, a predicate which relates the model and implementation state can be employed, 
which may prune the state-space evolution earlier than by just observing method 
return values. 

The problem of relating object identities is dealt with as follows. A mapping from 
model to associated implementation objects is maintained. Whenever a monitored 
implementation returns an object, the according model method’s returned object must 
either map to exactly that object in the mapping, or no entry in the mapping exists, in 
which case one is created. One can think of this mechanism as letting object identities 
in the model being distinct logical variables which are "bound" with the associated 
object identities of the implementation. 

7 Discussion and Conclusion 

We presented aspects of a first version of an integrated environment for model-based 
testing with AsmL and illustrated its use by an example. The environment combines 
and refines the techniques for parameter generation, ESM generation, call sequence 
generation, and conformance testing in a novel way. We conclude with discussing 
applications, related work, and future work. 

7.1 Applications 

Though the AsmL test environment is still in a prototypical stage, it has been applied 
in several non-trivial projects at Microsoft. 

- The parameter generator has been used for testing an implementation of the XPath 
language. The stateless model of XPath used for that purpose consists of around 33 
pages. More than a million tests have been generated, out of which the system 
identified 120 test cases which already resulted in 90% model code branch cover- 
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age. To achieve full model code branch coverage the test engineer added 10 tests 
manually. The recovery of the manual test cases was easy since the system tracks 
branches which haven’t been covered. 

- The FSM and sequence generator has been used for testing web-services protocols, 
among them reliable messaging (RM). The model for RM consists of around 40 
pages. The FSM generator produces a machine with around 1500 transitions out of 
30000 possible in a couple of minutes, simulating various kinds of wire failure and 
recovery operations. 

- Within the few months of its introduction, the AsmL test environment has gained 
considerable interest in the model-based testing community at Microsoft. Model- 
based testing using finite state machine models are in use at Microsoft for quite 
some time (a couple of hundred people are registered for the internal mailing list, 
to give an impression). The more powerful approach provided by AsmL is investi- 
gated by many of these users, and we expect a couple of new applications in the 
near future. 

7.2 Related Work 

Our approach to parameter generation is based on and extends the work found in [7], 
which is a later branch of the work of the authors of [8]. Whereas the authors use a 
Java data type to describe what they call the "finitization", we use a richer interactive 
method for what we call "domain configuration". Other extensions of our approach 
include a cost function for the generation of recursive domains and detection of iso- 
morphisms for value types. 

The conformance testing conducted by our tool environment can be classified as 
grey box testing. Traditional FSM based testing techniques with either Mealy or 
Moore machines typically amount to black box testing, where the actual states of the 
implementation are unobservable. In contrast, our testing approach allows the user to 
specify conformance relations connecting the model state to the state of the imple- 
mentation, in addition to pure input/output behavior reflected at the API level. In 
other words, the tool may be used to perform a limited form of white box testing, 
where the limitations depend on what part of the implementation state is accessible (if 
no state is accessible, the tool works as well, but may not be able to detect errors as 
early as they occur.) This approach is possible due to the intermediate language plat- 
form provided by the .NET runtime, which the tool architecture is based on, that 
allows binary level access to the state of the implementation. 

The basic FSM generation algorithm that is implemented in the test tool has been 
significantly extended since its first description in [5]. Test case generation that is 
performed on the basis of the generated FSM can be classified as a T-method [13]. 
We have not considered utilizing more powerful methods, such as U- D- or W- 
methods [13] used in pure black box testing. 

One of the first automated techniques for extracting FSMs from model-based 
specifications for the purpose of test case generation, introduced in [9], is based on a 
finite partitioning of the state space of the model using full disjunctive normal forms. 
While our partition of the state space is similar to that of the DNF approach, the two 
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approaches are quite different. Most importantly, the DNF approach employs sym- 
bolic techniques while we build the FSM by executing the specification. This enables 
us to support the full spectrum of AsmL, including call-outs from the model into 
framework code. 

In [10] projections on state machines are used to restrict them for a certain test 
purpose; also filters on states are used. This is related to our pruning technique of 
state exploration, though we never look at the larger FSM but generate the projected 
one from the beginning. 

In model checking, data abstraction is used to cope with state explosion when the 
original model M is too large. Data abstraction groups states of M and produces a 
reduced model M^ which is analogous to the FSM produced in our test environment 
by using properties. However, whereas in model checking operations need to be lifted 
to the abstract domain as well, which is the fundamental difficulty there, we still work 
with the operations on the concrete data, which can be realized using full AsmL. Due 
to efficiency considerations, the standard data abstraction algorithms of model- 
checking may yield an over-approximation of M^; see [11]. In contrast, our approach 
may yield an under-approximation of the true abstraction, in other words some transi- 
tions may be missing, but there are no false transitions, which is important for using 
the FSM for test case generation. 

In general, model checking techniques have been considered in the context of 
ASM based test case generation; in [12] the counter examples of SPIN are considered 
as test cases generated from a given ASM and a given property. The technique of 
using a model-checker with negated goal-states, and then letting the model-checker 
produce a counter-example which can be interpreted as a test sequence to reach this 
state, has been proposed by many other authors for automatic test generation. We 
believe this approach is highly restricted, on the one hand by the input language re- 
strictions most model-checkers have to obey, on the other hand because tailored 
search machines for finding tests can be more efficient. For example, a model checker 
used to generate tests finds just one test sequence per exploration whereas our ap- 
proach finds all test sequences in one exploration of the ASM. 

Currently our tool supports the Rural Chinese Postman Tour method to traverse the 
generated FSM. For an efficient implementation of the postman tour the tool uses the 
algorithm for Maximal Weight Bipartite Matching given in [6]. In general, the test 
methodology of the tool is an extension of the FSM approach. The bulk of the work 
in this area has dealt with deterministic FSMs. See [4, 13] for comprehensive surveys 
and [14] for an overview of the literature. The Extended Finite State Machine 
(EFSM) approach has been introduced mainly to cope with the state explosion prob- 
lem of the FSM approach. Typically the problem arises when the system to be mod- 
eled has variables with values in large, even infinite, domains, for example integers. 
In an EFSM, such variables are allowed, and the transitions may depend on and up- 
date their values; see [15] [16]. In EFSMs, the control part is finite and is separated 
from the data part, which distinguishes them from ASMs. An interesting problem in 
our FSM generation algorithm is to fiddle with the properties in order to avoid non- 
determinism. This problem is related to the stabilization problem of EFSMs [16]. The 
use of input/output FSMs for fault coverage based test case generation is studied in 
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[17]. The specification FSMs in [17] are (possibly non-deterministic) Mealy ma- 
chines. 

Conformance testing plays a central role in testing communication protocols where 
it is important to have a precise model of the observable behavior of the system. This 
has lead to a testing theory based on labeled transition systems. See an overview of 
the approach in [18] and an overview of related literature in [19]. Labeled transition 
systems are in general nondeterministic. In the LTS approach, verification techniques 
can be used to deal with state explosion and to generate test cases. TGV [20] is an 
industrial tool that utilizes the LTS approach to generate test cases from SDL specifi- 
cations. Fault model based FSM testing methodology has been recently considered 
for labeled transition systems as well [21,31]. 

There are many different groups doing work related to runtime verification. Per- 
haps the closest is the JML runtime assertion checking provided for components writ- 
ten in Java [22]. Eiffel [23] also provides for the checking of pre- and post- 
conditions, but only for components written in Eiffel. There are many similar design- 
by-contract tools for Java, such as JMSAssert [24], iContract [25], Handshake [26], 
Jass [27], and JContract [28]. However, all lack any facility for maintaining the state- 
space separation between the specification and the implementation. More general 
component-oriented work has been done by Edwards [29] to generate wrapper com- 
ponents for checking pre- and post-conditions, but cannot handle more general syn- 
chronization issues that require model programs. 

7.3 Future Work 

Several extensions of the AsmL test environment are on the way. High priority on our 
agenda is dealing with non-determinism in the model. Though we can handle non- 
determinism on the level of runtime verification, the test generator can not deal with 
it, not at least because its output, sequences, is not a suitable representation. We are 
looking at two different approaches. One promising approach is on-the-fly testing 
[30], which in our setting amounts to fusing the ESM generation with conformance 
testing. This approach has the advantage that non-determinism of the model is imme- 
diately pruned by the decisions of the implementation. However, in our experience 
some user groups require the tests as data in their development process. Eor these 
applications, we look at generating DAGs (directed acyclic graphs) instead of se- 
quences. Test cases for non-deterministic systems are usually tree structures (as one 
form of DAG). A further topic of future work is employing symbolic computation by 
means of constraint resolution, lifting restrictions of our approach implied by comput- 
ing with ground data. 
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